splitting files in unix splitting files in unix unix unix

splitting files in unix


I assume you're using split -b which will be more CPU-efficient than splitting by lines, but still reads the whole input file and writes it out to each file. If the serial nature of the execution of this portion of split is your bottleneck, you can use dd to extract the chunks of the file in parallel. You will need a distinct dd command for each parallel process. Here's one example command line (assuming the_input_file is a large file this extracts a bit from the middle):

dd skip=400 count=1 if=the_input_file bs=512 of=_output

To make this work you will need to choose appropriate values of count and bs (those above are very small). Each worker will also need to choose a different value of skip so that the chunks don't overlap. But this is efficient; dd implements skip with a seek operation.

Of course, this is still not as efficient as implementing your data consumer process in such a way that it can read a specified chunk of the input file directly, in parallel with other similar consumer processes. But I assume if you could do that you would not have asked this question.


Given that this is an OS utility, my inclination would be to think that it's optimized for best performance.

You can see this question (or do a man -k split or man split) to find related commands that you might be able to use instead of split.

If you are thinking of implementing your own solution in say C, then I would suggest you run some benchmarks this for your own specific system/environment and some sample data and determine what tool to use.

Note: if you aren't going to be doing this regularly, it may not be worth your while to even think about this much, just go ahead and use a tool that does what you need it to do (in this case split)