how to write a process-pool bash shell how to write a process-pool bash shell bash bash

how to write a process-pool bash shell


Use xargs:

xargs -P <maximun-number-of-process-at-a-time> -n <arguments per process> <commnad>

Details here.


I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.

The following function is the function that the worker processes run when forked.

# \brief the worker function that is called when we fork off worker processes# \param[in] id  the worker ID# \param[in] job_queue  the fifo to read jobs from# \param[in] result_log  the temporary log file to write exit codes tofunction _job_pool_worker(){    local id=$1    local job_queue=$2    local result_log=$3    local line=    exec 7<> ${job_queue}    while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do        # workers block on the exclusive lock to read the job queue        flock --exclusive 7        read line <${job_queue}        flock --unlock 7        # the worker should exit if it sees the end-of-job marker or run the        # job otherwise and save its exit code to the result log.        if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then            # write it one more time for the next sibling so that everyone            # will know we are exiting.            echo "${line}" >&7        else            _job_pool_echo "### _job_pool_worker-${id}: ${line}"            # run the job            { ${line} ; }             # now check the exit code and prepend "ERROR" to the result log entry            # which we will use to count errors and then strip out later.            local result=$?            local status=            if [[ "${result}" != "0" ]]; then                status=ERROR            fi              # now write the error to the log, making sure multiple processes            # don't trample over each other.            exec 8<> ${result_log}            flock --exclusive 8            echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}            flock --unlock 8            exec 8>&-            _job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"        fi      done    exec 7>&-}

You can get a copy of my solution at Github. Here's a sample program using my implementation.

#!/bin/bash. job_pool.shfunction foobar(){    # do something    true}   # initialize the job pool to allow 3 parallel jobs and echo commandsjob_pool_init 3 0# run jobsjob_pool_run sleep 1job_pool_run sleep 2job_pool_run sleep 3job_pool_run foobarjob_pool_run foobarjob_pool_run /bin/false# wait until all jobs complete before continuingjob_pool_wait# more jobsjob_pool_run /bin/falsejob_pool_run sleep 1job_pool_run sleep 2job_pool_run foobar# don't forget to shut down the job pooljob_pool_shutdown# check the $job_pool_nerrors for the number of jobs that exited non-zeroecho "job_pool_nerrors: ${job_pool_nerrors}"

Hope this helps!


Using GNU Parallel you can do:

cat tasks | parallel -j4 myprog

If you have 4 cores, you can even just do:

cat tasks | parallel myprog

From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:

Full installation

Full installation of GNU Parallel is as simple as:

./configure && make && make install

Personal installation

If you are not root you can add ~/bin to your path and install in~/bin and ~/share:

./configure --prefix=$HOME && make && make install

Or if your system lacks 'make' you can simply copy src/parallelsrc/sem src/niceload src/sql to a dir in your path.

Minimal installation

If you just need parallel and do not have 'make' installed (maybe thesystem is old or Microsoft Windows):

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallelchmod 755 parallelcp parallel semmv parallel sem dir-in-your-$PATH/bin/

Test the installation

After this you should be able to do:

parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org

This will send 3 ping packets to 3 different hosts in parallel and printthe output when they complete.

Watch the intro video for a quick introduction:https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1