how to write a process-pool bash shell
Use xargs
:
xargs -P <maximun-number-of-process-at-a-time> -n <arguments per process> <commnad>
Details here.
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
The following function is the function that the worker processes run when forked.
# \brief the worker function that is called when we fork off worker processes# \param[in] id the worker ID# \param[in] job_queue the fifo to read jobs from# \param[in] result_log the temporary log file to write exit codes tofunction _job_pool_worker(){ local id=$1 local job_queue=$2 local result_log=$3 local line= exec 7<> ${job_queue} while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do # workers block on the exclusive lock to read the job queue flock --exclusive 7 read line <${job_queue} flock --unlock 7 # the worker should exit if it sees the end-of-job marker or run the # job otherwise and save its exit code to the result log. if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then # write it one more time for the next sibling so that everyone # will know we are exiting. echo "${line}" >&7 else _job_pool_echo "### _job_pool_worker-${id}: ${line}" # run the job { ${line} ; } # now check the exit code and prepend "ERROR" to the result log entry # which we will use to count errors and then strip out later. local result=$? local status= if [[ "${result}" != "0" ]]; then status=ERROR fi # now write the error to the log, making sure multiple processes # don't trample over each other. exec 8<> ${result_log} flock --exclusive 8 echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log} flock --unlock 8 exec 8>&- _job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}" fi done exec 7>&-}
You can get a copy of my solution at Github. Here's a sample program using my implementation.
#!/bin/bash. job_pool.shfunction foobar(){ # do something true} # initialize the job pool to allow 3 parallel jobs and echo commandsjob_pool_init 3 0# run jobsjob_pool_run sleep 1job_pool_run sleep 2job_pool_run sleep 3job_pool_run foobarjob_pool_run foobarjob_pool_run /bin/false# wait until all jobs complete before continuingjob_pool_wait# more jobsjob_pool_run /bin/falsejob_pool_run sleep 1job_pool_run sleep 2job_pool_run foobar# don't forget to shut down the job pooljob_pool_shutdown# check the $job_pool_nerrors for the number of jobs that exited non-zeroecho "job_pool_nerrors: ${job_pool_nerrors}"
Hope this helps!
Using GNU Parallel you can do:
cat tasks | parallel -j4 myprog
If you have 4 cores, you can even just do:
cat tasks | parallel myprog
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
Full installation of GNU Parallel is as simple as:
./configure && make && make install
Personal installation
If you are not root you can add ~/bin to your path and install in~/bin and ~/share:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallelsrc/sem src/niceload src/sql to a dir in your path.
Minimal installation
If you just need parallel and do not have 'make' installed (maybe thesystem is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallelchmod 755 parallelcp parallel semmv parallel sem dir-in-your-$PATH/bin/
Test the installation
After this you should be able to do:
parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org
This will send 3 ping packets to 3 different hosts in parallel and printthe output when they complete.
Watch the intro video for a quick introduction:https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1