How can I split a large text file into smaller files with an equal number of lines? How can I split a large text file into smaller files with an equal number of lines? unix unix

How can I split a large text file into smaller files with an equal number of lines?


Have a look at the split command:

$ split --helpUsage: split [OPTION] [INPUT [PREFIX]]Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; defaultsize is 1000 lines, and default PREFIX is `x'.  With no INPUT, or when INPUTis -, read standard input.Mandatory arguments to long options are mandatory for short options too.  -a, --suffix-length=N   use suffixes of length N (default 2)  -b, --bytes=SIZE        put SIZE bytes per output file  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file  -d, --numeric-suffixes  use numeric suffixes instead of alphabetic  -l, --lines=NUMBER      put NUMBER lines per output file      --verbose           print a diagnostic to standard error just                            before each output file is opened      --help     display this help and exit      --version  output version information and exit

You could do something like this:

split -l 200000 filename

which will create files each with 200000 lines named xaa xab xac ...

Another option, split by size of output file (still splits on line breaks):

 split -C 20m --numeric-suffixes input_filename output_prefix

creates files like output_prefix01 output_prefix02 output_prefix03 ... each of maximum size 20 megabytes.


Use the split command:

split -l 200000 mybigfile.txt


Yes, there is a split command. It will split a file by lines or bytes.

$ split --helpUsage: split [OPTION]... [INPUT [PREFIX]]Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; defaultsize is 1000 lines, and default PREFIX is `x'.  With no INPUT, or when INPUTis -, read standard input.Mandatory arguments to long options are mandatory for short options too.  -a, --suffix-length=N   use suffixes of length N (default 2)  -b, --bytes=SIZE        put SIZE bytes per output file  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file  -d, --numeric-suffixes  use numeric suffixes instead of alphabetic  -l, --lines=NUMBER      put NUMBER lines per output file      --verbose           print a diagnostic just before each                            output file is opened      --help     display this help and exit      --version  output version information and exitSIZE may have a multiplier suffix:b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024,GB 1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y.