Fast concatenate multiple files on Linux Fast concatenate multiple files on Linux linux linux

Fast concatenate multiple files on Linux


Even if there was such a tool, this could only work if the files except the lastwere guaranteed to have a size that is a multiple of the filesystem's blocksize.

If you control how the data is written into the temporary files, and you knowhow large each one will be, you can instead do the following

  1. Before starting the multiprocessing, create the final output file, and growit to the final size byfseek()ingto the end, this will create asparse file.

  2. Start multiprocessing, handing each process the FD and the offset into itsparticular slice of the file.

This way, the processes will collaboratively fill the single output file,removing the need to cat them together later.

EDIT

If you can't predict the size of the individual files, but the consumer of thefinal file can work with sequential (as opposed to random-access) input, you canfeed cat tmpfile1 .. tmpfileN to the consumer, either on stdin

cat tmpfile1 ... tmpfileN | consumer

or via named pipes (using bash's Process Substitution):

consumer <(cat tmpfile1 ... tmpfileN)


You indicate that you don't know in advance the size of each temporary file. With this in mind, I think your best bet is to write a FUSE filesystem that would present the chunks as a single large file, while keeping them as individual files on the underlying filesystem.

In this solution, your producing and consuming apps remain unchanged. The producers write out a bunch of files that the FUSE layer makes appear as a single file. This virtual file is then presented to the consumer.

FUSE has bindings for a bunch of languages, including Python. If you look at some examples here or here (these are for different bindings), this requires surprisingly little code.


For 4 files; xaa, xab, xac, xad a fast concatention in bash (as root):

losetup -v -f xaa; losetup -v -f xab; losetup -v -f xac; losetup -v -f xad

(Let's suppose that loop0, loop1, loop2, loop3 are the names of the new device files.)

Put http://pastebin.com/PtEDQH7G into a "join_us" script file. Then you can use it like this:

./join_us /dev/loop{0..3}

Then (if this big file is a film) you can give its ownership to a normal user (chown itsme /dev/mapper/joined) and then he/she can play it via: mplayer /dev/mapper/joined

The cleanup after these (as root):

dmsetup remove joined; losetup -d /dev/loop[0123]