CSV Generator
A simple-to-use script for generating CSV files with hundreds of columns and billions of rows.
Installation
To install globally:
npm i -g csv-generator # or yarn
yarn global add csv-generatorSince the command is on our global path, we can execute via:
gencsv --helpyou can also install locally:
npm i csv-generator
yarn add csv-generatorwe can run locally:
./node_modules/.bin/gencsv --helpYou can set a soft link to your project root also:
ln -s gencsv ./node_modules/.bin/gencsv
./gencsv --helpUsage
You can use this script in three ways:
- Interactive CLI Interface
- CLI Interface
- API (used for embedding in your other scripts/modules)
Interactive Interface
The interactive interface is a series of questions and answers. This interface is not the default interface for this tool, but we can launch it via:
gencsv --interactive # or
gencsv -iif you want always use the interactive interface, you can save this preference:
gencsv --always-interactive # or
gencsv -aDon't need this interface anymore? No problem:
gencsv --clear-settings # or
gencsv -cYou can also run the script after clearing the settings if you wish:
gencsv -c foo.csv < columns.txtCLI Interface
Resembling other commands like ls, this is the default feel:
gencsv [options] <file> [columns..]
gencsv [options] <file> < columns.txt
Commands:
  <file>  The output file name
Options:
  --chunk                   The number of rows to generate per pass
                                                        [number] [default: 1000]
  --functions, --func       Lists available functions
  --rows, -r                The number of rows to generate (e.g. 100, 100000,
                            100k, 1M, 1B, etc.)                   [default: 100]
  --use-headers, -h         Use this flag to set column headers
                                                      [boolean] [default: false]
  --interactive, -i         Run the script in interactive mode with a series of
                            questions and user-provided answers
  --always-interactive, -a  Start the script in interactive mode and save this
                            setting
  --always-use-headers      Use this flag to persist column headers preferences
  --clear-settings, -c      Clear always-interactive settings
  --silent, -s              Minimal console output    [boolean] [default: false]
  --help                    Show help                                  [boolean]
  --version                 Show version number                        [boolean]We can define our tables two ways:
- Manual entry - When issuing the command, be sure to separate functions by space — other delimiters are not supported with this method of entry. - gencsv foo.csv 'name email ccnumber date(2) pick(a|b)' # quote the argument to avoid escaping gencsv foo.csv name email ccnumber date\(2\) pick\(a\|b\) # or escape special BASH characters- This will generate a - .csvfile in our current working directory with five columns and 100 rows.- Need more rows? Need a lot of rows? No problem. - gencsv foo.csv -r 100K 'name email ccnumber date(2) pick(AWS|Azure|Google Cloud|Digital Ocean)' # or gencsv foo.csv -r 100000 'name email ccnumber date(2) pick(AWS|Azure|Google Cloud|Digital Ocean)'- Same table, but this time with 100,000 rows! - Need even more? This script provides additional aliases: - K+= 10^3 (e.g. 80K)
- M+= 10^6 (e.g. 0.2M)
- B+= 10^9 (e.g 2.13B)- Want to interpolate BASH variables? Use double quotes: - gencsv foo.csv "name email ccnumber date(2) pick($USER|b)"
 
- Columns definition as plain text - For wide tables, we can define our column definitions in a separate file: - gencsv foo.csv < columns.txt- where - columns.txtcontains:- name, email, ccnumber, birthday, date, ...- We can define column headers in - columns.txtas well:- Name, Email Address, Credit Card Number, Birthday, Transaction Date, ... name, email, ccnumber, birthday, date, ...
API Interface
What's that? An API? Yes. As of 1.0.2 you can require csv-generator in your node app:
const gencsv = require('csv-generator');
gencsv('foo.csv', ['name'], {
    rows: 100,
    chunks: 10,
    silent: true
}).then(
    res => {
        console.log(res);
    },
    e => {
        console.error(e);
    }
);Performance Notes
By nature, some functions are considerably slower than others, for example:
| Function | Time to Write 10K Rows by 100 Columns | 
|---|---|
| string | 9.481s | 
| string(8) | 6.305s | 
| alpha | 6.138s | 
| alpha(8) | 4.061s | 
| sentence | 40.378s | 
by contrast, these functions are much faster:
| Function | Time to Write 10K Rows by 100 Columns | 
|---|---|
| guid | 1.264s | 
| age | 0.72s | 
| integer | 0.684s | 
| zip | 2.579 | 
| yn | 0.706s | 
If you need to generate large amounts of data for wide tables it is recommended to use fast functions.
More Notes
Instead of hundreds of async file writes (i.e. page faults), this generator uses a single stream to write content to the file. The trade-off is normal heap space, less CPU involvement, but more virtual memory used — memory is cheap; transistors aren't.
Vendor Copyright Notice
This tool uses some scripts that are copyright of Data Design Group Inc. For more information see the README file in lib/vendor.
Developers
Want to add a feature? Have a useful generator? Need to debug? The readline interface makes for cumbersome debugging, but node has our back:
node --inspect --inspect-brk index.jsor if you need to debug the CLI interface:
node --inspect --inspect-brk bin/gencsv foo.csv < columns.txtopen up Google Chrome and navigate to:
chrome://inspectRunning Tests
To run unit tests:
npm run coverage # or
yarn coverage