Is there a difference in efficiency between pipelined sed invocations and multiple sed expressions?

Short Answer

Using multiple expressions will be faster than using multiple pipelines, because you there's additional overhead in creating pipelines and forking sed processes. However, it's rarely enough of a difference to matter in practice.

Benchmarks

Using multiple expressions is faster than multiple pipelines, but probably not enough to matter for the average use case. Using your example, the average difference in execution speed was only two-thousandths of a second, which is not enough to get excited about.

# Average run with multiple pipelines.$ time {    echo "$var1" |     sed 's/pattern1/replacement1/g' |    sed 's/pattern2/replacement2/g' |    sed 's/pattern3/replacement3/g' |    sed 's/pattern4/replacement4/g' |    sed 's/pattern5/replacement5/g'}Some string of textreal        0m0.007suser        0m0.000ssys         0m0.004s

# Average run with multiple expressions.$ time {    echo "$var1" | sed \    -e 's/pattern1/replacement1/g' \    -e 's/pattern2/replacement2/g' \    -e 's/pattern3/replacement3/g' \    -e 's/pattern4/replacement4/g' \    -e 's/pattern5/replacement5/g'}Some string of textreal        0m0.005suser        0m0.000ssys         0m0.000s

Granted, this isn't testing against a large input file, thousands of input files, or running in a loop with tens of thousands of iterations. Still, it seems safe to say that the difference is small enough to be irrelevant for most common situations.

Uncommon situations are a different story. In such cases, benchmarking will help you determine whether replacing pipes with in-line expressions is a valuable optimization for that use case.

regex performance bash sed

Most of the overhead in sed tends to be processing regular expressions but you're processing the same number of regular expressions in each of your examples.

Consider that the operating system needs to construct std and stdout for each element of the pipe. Sed also takes memory in your system, and the OS must allocate that memory for each instance of sed -- whether that's one instance or four.

Here's my assessment:

$ jot -r 1000000 1 10000 | time sed 's/1/_/g' | time sed 's/2/_/g' | time sed 's/3/_/g' | time sed 's/4/_/g' >/dev/null         2.38 real         0.84 user         0.01 sys        2.38 real         0.84 user         0.01 sys        2.39 real         0.85 user         0.01 sys        2.39 real         0.85 user         0.01 sys$ jot -r 1000000 1 10000 | time sed 's/1/_/g;s/2/_/g;s/3/_/g;s/4/_/g' >/dev/null        2.71 real         2.57 user         0.02 sys$ jot -r 1000000 1 10000 | time sed 's/1/_/g;s/2/_/g;s/3/_/g;s/4/_/g' >/dev/null        2.71 real         2.56 user         0.02 sys$ jot -r 1000000 1 10000 | time sed 's/1/_/g;s/2/_/g;s/3/_/g;s/4/_/g' >/dev/null        2.71 real         2.57 user         0.02 sys$ jot -r 1000000 1 10000 | time sed 's/1/_/g;s/2/_/g;s/3/_/g;s/4/_/g' >/dev/null        2.74 real         2.57 user         0.02 sys$ dc.84 2* .85 2* + p3.38$

And since 3.38 > 2.57, les time is taken up if you use a single instance of sed.

regex performance bash sed

Yes. You'll avoid the overhead of starting sed anew each time.

CodeHunter

Is there a difference in efficiency between pipelined sed invocations and multiple sed expressions?

Short Answer

Benchmarks

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last