Bash while read loop extremely slow compared to cat, why?

linux bash performance shell

The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:

while read size; do dd bs=$size count=1 of=file$(( i++ )); done

in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.

linux bash performance shell

It's because the bash script is interpreted and not really optimised for speed in this case. You're usually better off using one of the external tools such as:

awk 'NR%1000==0{print}' inputFile

which matches your "print every 1000 lines" sample.

If you wanted to (for each line) output the line count in characters followed by the line itself, and pipe it through another process, you could also do that:

awk '{print length($0)" "$0}' inputFile | someOtherProcess

Tools like awk, sed, grep, cut and the more powerful perl are far more suited to these tasks than an interpreted shell script.

linux bash performance shell

The perl solution for count bytes of each string:

perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""'

for example:

dd if=/dev/urandom bs=1M count=100 |   perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""' |   tail

works for me as 7.7Mb/s

to compare how much script used:

dd if=/dev/urandom bs=1M count=100 >/dev/null

run as 9.1Mb/s

seems script not so slow :)

CodeHunter

Bash while read loop extremely slow compared to cat, why?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last