When does command substitution spawn more subshells than the same commands in isolation?

bash shell optimization subshell command-substitution

Update and caveat:

This answer has a troubled past in that I confidently claimed things that turned out not to be true. I believe it has value in its current form, but please help me eliminate other inaccuracies (or convince me that it should be deleted altogether).

I've substantially revised - and mostly gutted - this answer after @kojiro pointed out that my testing methods were flawed (I originally used ps to look for child processes, but that's too slow to always detect them); a new testing method is described below.

I originally claimed that not all bash subshells run in their own child process, but that turns out not to be true.

As @kojiro states in his answer, some shells - other than bash - DO sometimes avoid creation of child processes for subshells, so, generally speaking in the world of shells, one should not assume that a subshell implies a child process.

As for the OP's cases in bash (assumes that command{n} instances are simple commands):

# Case #1command1         # NO subshellvar=$(command1)  # 1 subshell (command substitution)# Case #2command1 | command2         # 2 subshells (1 for each pipeline segment)var=$(command1 | command2)  # 3 subshells: + 1 for command subst.# Case #3command1 | command2 ; var=$?         # 2 subshells (due to the pipeline)var=$(command1 | command2 ; echo $?) # 3 subshells: + 1 for command subst.;                                     #   note that the extra command doesn't add                                      #   one

It looks like using command substitution ($(...)) always adds an extra subshell in bash - as does enclosing any command in (...).

I believe, but am not certain these results are correct; here's how I tested (bash 3.2.51 on OS X 10.9.1) - please tell me if this approach is flawed:

Made sure only 2 interactive bash shells were running: one to run the commands, the other to monitor.
In the 2nd shell I monitored the fork() calls in the 1st with sudo dtruss -t fork -f -p {pidOfShell1} (the -f is necessary to also trace fork() calls "transitively", i.e. to include those created by subshells themselves).
Used only the builtin : (no-op) in the test commands (to avoid muddling the picture with additional fork() calls for external executables); specifically:
- :
- $(:)
- : | :
- $(: | :)
- : | :; :
- $(: | :; :)
Only counted those dtruss output lines that contained a non-zero PID (as each child process also reports the fork() call that created it, but with PID 0).
Subtracted 1 from the resulting number, as running even just a builtin from an interactive shell apparently involves at least 1 fork().
Finally, assumed that the resulting count represents the number of subshells created.

Below is what I still believe to be correct from my original post: when bash creates subshells.

bash creates subshells in the following situations:

for an expression surrounded by parentheses ( (...) )
- except directly inside [[ ... ]], where parentheses are only used for logical grouping.

for every segment of a pipeline (|), including the first one
- Note that every subshell involved is a clone of the original shell in terms of content (process-wise, subshells can be forked from other subshells (before commands are executed)).
  Thus, modifications of subshells in earlier pipeline segments do not affect later ones.
  (By design, commands in a pipeline are launched simultaneously - sequencing only happens through their connected stdin/stdout pipes.)
- bash 4.2+ has shell option lastpipe (OFF by default), which causes the last pipeline segment NOT to run in a subshell.

for command substitution ($(...))
for process substitution (<(...))
- typically creates 2 subshells; in the case of a simple command, @konsolebox came up with a technique to only create 1: prepend the simple command with exec (<(exec ...)).

background execution (&)

Combining these constructs will result in more than one subshell.

bash shell optimization subshell command-substitution

In Bash, a subshell always executes in a new process space. You can verify this fairly trivially in Bash 4, which has the $BASHPID and $$ environment variables:

$$ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the subshell.
BASHPID Expands to the process id of the current bash process. This differs from $$ under certain circumstances, such as subshells that do not require bash to be re-initialized

in practice:

$ type echoecho is a shell builtin$ echo $$-$BASHPID4671-4671$ ( echo $$-$BASHPID )4671-4929$ echo $( echo $$-$BASHPID )4671-4930$ echo $$-$BASHPID | { read; echo $REPLY:$$-$BASHPID; }4671-5086:4671-5087$ var=$(echo $$-$BASHPID ); echo $var4671-5006

About the only case where the shell can elide an extra subshell is when you pipe to an explicit subshell:

$ echo $$-$BASHPID | ( read; echo $REPLY:$$-$BASHPID; )4671-5118:4671-5119

Here, the subshell implied by the pipe is explicitly applied, but not duplicated.

This varies from some other shells that try very hard to avoid fork-ing. Therefore, while I feel the argument made in js-shell-parse misleading, it is true that not all shells always fork for all subshells.

CodeHunter

When does command substitution spawn more subshells than the same commands in isolation?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last