CSV Generator
A simple-to-use script for generating CSV files with hundreds of columns and billions of rows.
Installation
To install globally:
npm i -g csv-generator # or yarn
yarn global add csv-generator
Since the command is on our global path, we can execute via:
gencsv --help
you can also install locally:
npm i csv-generator
yarn add csv-generator
we can run locally:
./node_modules/.bin/gencsv --help
You can set a soft link to your project root also:
ln -s gencsv ./node_modules/.bin/gencsv
./gencsv --help
Usage
You can use this script in three ways:
- Interactive CLI Interface
- CLI Interface
- API (used for embedding in your other scripts/modules)
Interactive Interface
The interactive interface is a series of questions and answers. This interface is not the default interface for this tool, but we can launch it via:
gencsv --interactive # or
gencsv -i
if you want always use the interactive interface, you can save this preference:
gencsv --always-interactive # or
gencsv -a
Don't need this interface anymore? No problem:
gencsv --clear-settings # or
gencsv -c
You can also run the script after clearing the settings if you wish:
gencsv -c foo.csv < columns.txt
CLI Interface
Resembling other commands like ls
, this is the default feel:
gencsv [options] <file> [columns..]
gencsv [options] <file> < columns.txt
Commands:
<file> The output file name
Options:
--chunk The number of rows to generate per pass
[number] [default: 1000]
--functions, --func Lists available functions
--rows, -r The number of rows to generate (e.g. 100, 100000,
100k, 1M, 1B, etc.) [default: 100]
--use-headers, -h Use this flag to set column headers
[boolean] [default: false]
--interactive, -i Run the script in interactive mode with a series of
questions and user-provided answers
--always-interactive, -a Start the script in interactive mode and save this
setting
--always-use-headers Use this flag to persist column headers preferences
--clear-settings, -c Clear always-interactive settings
--silent, -s Minimal console output [boolean] [default: false]
--help Show help [boolean]
--version Show version number [boolean]
We can define our tables two ways:
Manual entry
When issuing the command, be sure to separate functions by space — other delimiters are not supported with this method of entry.
gencsv foo.csv 'name email ccnumber date(2) pick(a|b)' # quote the argument to avoid escaping gencsv foo.csv name email ccnumber date\(2\) pick\(a\|b\) # or escape special BASH characters
This will generate a
.csv
file in our current working directory with five columns and 100 rows.Need more rows? Need a lot of rows? No problem.
gencsv foo.csv -r 100K 'name email ccnumber date(2) pick(AWS|Azure|Google Cloud|Digital Ocean)' # or gencsv foo.csv -r 100000 'name email ccnumber date(2) pick(AWS|Azure|Google Cloud|Digital Ocean)'
Same table, but this time with 100,000 rows!
Need even more? This script provides additional aliases:
K
+= 10^3 (e.g. 80K)M
+= 10^6 (e.g. 0.2M)B
+= 10^9 (e.g 2.13B)Want to interpolate BASH variables? Use double quotes:
gencsv foo.csv "name email ccnumber date(2) pick($USER|b)"
Columns definition as plain text
For wide tables, we can define our column definitions in a separate file:
gencsv foo.csv < columns.txt
where
columns.txt
contains:name, email, ccnumber, birthday, date, ...
We can define column headers in
columns.txt
as well:Name, Email Address, Credit Card Number, Birthday, Transaction Date, ... name, email, ccnumber, birthday, date, ...
API Interface
What's that? An API? Yes. As of 1.0.2 you can require
csv-generator
in your node app:
const gencsv = require('csv-generator');
gencsv('foo.csv', ['name'], {
rows: 100,
chunks: 10,
silent: true
}).then(
res => {
console.log(res);
},
e => {
console.error(e);
}
);
Performance Notes
By nature, some functions are considerably slower than others, for example:
Function | Time to Write 10K Rows by 100 Columns |
---|---|
string |
9.481s |
string(8) |
6.305s |
alpha |
6.138s |
alpha(8) |
4.061s |
sentence |
40.378s |
by contrast, these functions are much faster:
Function | Time to Write 10K Rows by 100 Columns |
---|---|
guid |
1.264s |
age |
0.72s |
integer |
0.684s |
zip |
2.579 |
yn |
0.706s |
If you need to generate large amounts of data for wide tables it is recommended to use fast functions.
More Notes
Instead of hundreds of async file writes (i.e. page faults), this generator uses a single stream to write content to the file. The trade-off is normal heap space, less CPU involvement, but more virtual memory used — memory is cheap; transistors aren't.
Vendor Copyright Notice
This tool uses some scripts that are copyright of Data Design Group Inc. For more information see the README file in lib/vendor
.
Developers
Want to add a feature? Have a useful generator? Need to debug? The readline
interface makes for cumbersome debugging, but node has our back:
node --inspect --inspect-brk index.js
or if you need to debug the CLI interface:
node --inspect --inspect-brk bin/gencsv foo.csv < columns.txt
open up Google Chrome and navigate to:
chrome://inspect
Running Tests
To run unit tests:
npm run coverage # or
yarn coverage