GNU Parallel, too many input files, Argument list too long GNU Parallel, too many input files, Argument list too long unix unix

GNU Parallel, too many input files, Argument list too long


Try:

ls samplefolder | grep \.txt | parallel "sample operation samplefolder/{}" 


Here's how you can deal with this on a typical UNIX box (I assume OSX has find and xargs too):

# find samplefolder -name \*.txt -print0 | xargs -P 8 -n 1 -0 sample operation

Find will print all .txt file names in samplefolder separated by a NUL character. xargs in turn will read this NUL-separated list (-0) and for each N files (-n1 -- for each file in this case) will launch sample operation path/file.txt with up to 8 (-P8) of them in parallel.


Handle that operation in smaller batches using -N, and pipe the input file list rather than giving it on the command line.

For example, expanding on ArtemB's answer, to process in batches of 16 files (warning, this will break with paths containing newlines):

find samplefolder -type f -name "*.txt" | parallel -N16 "sample operation" {}

To tailor the maximum number of arguments you can check getconf ARG_MAX in your environment. For example:

# ~$> getconf ARG_MAX2097152

given that paths on *nix can typically be 4096 characters, that leaves me free to put 2097152/4096=512 file paths on the command line (excluding the "sample operation" command itself of course).

So something like

find samplefolder -name "*.txt" | parallel -N500 "sample operation" {}

would let me process in batches of 500. Of course, depending on what tool you are running, you may want to experiment and optimize the batch size for speed.