Bash: how to simply parallelize tasks? Bash: how to simply parallelize tasks? bash bash

Bash: how to simply parallelize tasks?


Answering my own question... It turns out there's a relatively unknown feature of the xargs command that can be used to accomplish that:

find . -iname "*png" -print0 | xargs -0 --max-procs=4 -n 1 pngout

Bingo, instant 4x speedup on a quad-cores machine :)


to spawn all tasks in the background:

find $BASEDIR -iname "*png" | while read f; do  pngout "$f" &done

but of course that isn't the best option. to do 'n' tasks at a time:

i=0find $BASEDIR -iname "*png" | while read f; do  pngout "$f" &  i=$((i+1))  if [[ $i -gt $NTASKS ]]; then    wait    i=0  fidone

it's not optimal, since it waits until all the concurrent tasks are finished to start another group; but it should be better than nothing.


Parallellization is rarely trivial. In your case if you can select files uniquely in equal sized sets, then you can run multiple copies of your find script. You don't want to fire up 300 pictures in the background. For jobs like this it is usually faster to run them sequentially. Backgrounding the command or using batch are both viable options.

Assuming the files are consecutively numbered you could use a find pattern like "[0-4].png" for one find and "[5-9].png" on another. This would keep two cores running for roughly the same amount of time.

Farming task out would involve a dispatcher-runner setup. Building, testing, and running this would take quite a while.

Fire up BOINC to use those spare processesors. You will likely want to ignore niced processes when monitoring cpu frequency. Add code like this to rc.local.

for CPU in /sys/devices/system/cpu/cpu[0-9]*; do    echo 1 > ${CPU}/cpufreq/ondemand/ignore_nice_loaddone