How to split a file into equal parts, without breaking individual lines? [duplicate] How to split a file into equal parts, without breaking individual lines? [duplicate] unix unix

How to split a file into equal parts, without breaking individual lines? [duplicate]


If you mean an equal number of lines, split has an option for this:

split --lines=75

If you need to know what that 75 should really be for N equal parts, its:

lines_per_part = int(total_lines + N - 1) / N

where total lines can be obtained with wc -l.

See the following script for an example:

#!/usr/bin/bash# Configuration stufffspec=qq.cnum_files=6# Work out lines per file.total_lines=$(wc -l <${fspec})((lines_per_file = (total_lines + num_files - 1) / num_files))# Split the actual file, maintaining lines.split --lines=${lines_per_file} ${fspec} xyzzy.# Debug informationecho "Total lines     = ${total_lines}"echo "Lines  per file = ${lines_per_file}"    wc -l xyzzy.*

This outputs:

Total lines     = 70Lines  per file = 12  12 xyzzy.aa  12 xyzzy.ab  12 xyzzy.ac  12 xyzzy.ad  12 xyzzy.ae  10 xyzzy.af  70 total

More recent versions of split allow you to specify a number of CHUNKS with the -n/--number option. You can therefore use something like:

split --number=l/6 ${fspec} xyzzy.

(that's ell-slash-six, meaning lines, not one-slash-six).

That will give you roughly equal files in terms of size, with no mid-line splits.

I mention that last point because it doesn't give you roughly the same number of lines in each file, more the same number of characters.

So, if you have one 20-character line and 19 1-character lines (twenty lines in total) and split to five files, you most likely won't get four lines in every file.


The script isn't even necessary, split(1) supports the wanted feature out of the box:
split -l 75 auth.log auth.log. The above command splits the file in chunks of 75 lines a piece, and outputs file on the form: auth.log.aa, auth.log.ab, ...

wc -l on the original file and output gives:

  321 auth.log   75 auth.log.aa   75 auth.log.ab   75 auth.log.ac   75 auth.log.ad   21 auth.log.ae  642 total


A simple solution for a simple question:

split -n l/5 your_file.txt

no need for scripting here.

From the man file, CHUNKS may be:

l/N     split into N files without splitting lines

Update

Not all unix dist include this flag. For example, it will not work in OSX. To use it, you can consider replacing the Mac OS X utilities with GNU core utilities.