When should xargs be preferred over while-read loops? When should xargs be preferred over while-read loops? bash bash

When should xargs be preferred over while-read loops?


The thing with while loops is that they tend to process one item at a time, often when it's unnecessary. This is where xargs has an advantage - it can batch up the arguments to allow one command to process lots of items.

For example, a while loop:

pax> echo '123 45' | while read -r; do echo $REPLY; done12345

and the corresponding xargs:

pax> echo '123 45' | xargs echo1 2 3 4 5

Here you can see that the lines are processed one-by-one with the while and altogether with the xargs. In other words, the former is equivalent to echo 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5 while the latter is equivalent to echo 1 2 3 4 5 (five processes as opposed to one). This really makes a difference when processing thousands or tens of thousands of lines, since process creation takes time.

It's mostly advantageous when using commands that can accept multiple arguments since it reduces the number of individual processes started, making things much faster.

When I'm processing small files or the commands to run on each item are complicated (where I'm too lazy to write a separate script to give to xargs), I will use the while variant.

Where I'm interested in performance (large files), I will use xargs, even if I have to write a separate script.


Some implementations of xargs also understand a -P MAX-PROCS argument which lets xargs run multiple jobs in parallel. This would be quite difficult to simulate with a while read loop.


GNU Parallel http://www.gnu.org/software/parallel/ has the advantages from xargs (using -m) and the advantage of while-read with newline as separator and some new features (e.g. grouping of output, parallel running of jobs on remote computers, and context replace).

If you have GNU Parallel installed I cannot see a single situation in which you would use xargs. And the only situation in which I would use read-while would be if the block to execute is so big it becomes unreadable to put in a single line (e.g. if it contains if-statements or similar) and you refuse to make a bash function.

For all the small scripts I actually find it more readable to use GNU Parallel. paxdiablo's example:

echo '123 45' | parallel -m echo

Converting of WAV files to MP3 using GNU Parallel:

find sounddir -type f -name '*.wav' | parallel -j+0 lame {} -o {.}.mp3

Watch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ