Bash: how to simply parallelize tasks?
Answering my own question... It turns out there's a relatively unknown feature of the xargs command that can be used to accomplish that:
find . -iname "*png" -print0 | xargs -0 --max-procs=4 -n 1 pngout
Bingo, instant 4x speedup on a quad-cores machine :)
to spawn all tasks in the background:
find $BASEDIR -iname "*png" | while read f; do pngout "$f" &done
but of course that isn't the best option. to do 'n' tasks at a time:
i=0find $BASEDIR -iname "*png" | while read f; do pngout "$f" & i=$((i+1)) if [[ $i -gt $NTASKS ]]; then wait i=0 fidone
it's not optimal, since it waits until all the concurrent tasks are finished to start another group; but it should be better than nothing.
Parallellization is rarely trivial. In your case if you can select files uniquely in equal sized sets, then you can run multiple copies of your find script. You don't want to fire up 300 pictures in the background. For jobs like this it is usually faster to run them sequentially. Backgrounding the command or using batch are both viable options.
Assuming the files are consecutively numbered you could use a find pattern like "[0-4].png" for one find and "[5-9].png" on another. This would keep two cores running for roughly the same amount of time.
Farming task out would involve a dispatcher-runner setup. Building, testing, and running this would take quite a while.
Fire up BOINC to use those spare processesors. You will likely want to ignore niced processes when monitoring cpu frequency. Add code like this to rc.local.
for CPU in /sys/devices/system/cpu/cpu[0-9]*; do echo 1 > ${CPU}/cpufreq/ondemand/ignore_nice_loaddone