Why does ps o/p list the grep process after the pipe?
When you execute the command:
ps -ef | grep cron
the shell you are using
(...I assume bash in your case, due to the color attribute of grep I think you are running a gnu system like a linux distribution, but it's the same on other unix/shell as well...)
will execute the pipe()
call to create a FIFO, then it will fork()
(make a running copy of itself). This will create a new child process. This new generated child process will close()
its standard output file descriptor (fd 1) and attach the fd 1 to the write side of the pipe created by the father process (the shell where you executed the command). This is possible because the fork()
syscall will maintain, for each, a valid open file descriptor (the pipe fd in this case). After doing so it will exec()
the first (in your case) ps
command found in your PATH
environment variable. With the exec()
call the process will become the command you executed.
So, you now have the shell process with a child that is, in your case, the ps
command with -ef
attributes.
At this point, the parent (the shell) fork()
s again. This newly generated child process close()
s its standard input file descriptor (fd 0) and attaches the fd 0 to the read side of the pipe created by the father process (the shell where you executed the command).
After doing so it will exec()
the first (in your case) grep
command found in your PATH environment variable.
Now you have the shell process with two children (that are siblings) where the first one is the ps
command with -ef
attributes and the second one is the grep
command with the cron
attribute. The read side of the pipe is attached to the STDIN
of the grep
command and the write side is attached to the STDOUT
of the ps
command: the standard output of the ps
command is attached to the standard input of the grep
command.
Since ps
is written to send on the standard output info on each running process, while grep is written to get on its standard input something that has to match a given pattern, you'll have the answer to your first question:
- the shell runs:
ps -ef;
- the shell runs:
grep cron;
ps
sends data (that even contains the string "grep cron") togrep
grep
matches its search pattern from theSTDIN
and it matches the string "grep cron" because of the "cron" attribute you passed in togrep
: you are instructinggrep
to match the "cron" string and it does because "grep cron" is a string returned byps
at the timegrep
has started its execution.
When you execute:
ps -ef | grep '[c]ron'
the attribute passed instructs grep
to match something containing "c" followed by "ron". Like the first example, but in this case it will break the match string returned by ps
because:
- the shell runs:
ps -ef;
- the shell runs:
grep [c]ron;
ps
sends data (that even contains the stringgrep [c]ron
) togrep
grep
does not match its search pattern from the stdin because a string containing "c" followed by "ron" it's not found, but it has found a string containing "c" followed by "]ron"
GNU grep
does not have any string matching limit, and on some platforms (I think Solaris, HPUX, aix) the limit of the string is given by the "$COLUMN" variable or by the terminal's screen width.
Hopefully this long response clarifies the shell pipe process a bit.
TIP:
ps -ef | grep cron | grep -v grep
The shell constructs your pipeline with a series of fork()
, pipe()
and exec()
calls. Depending on the shell any part of it may be constructed first. So grep
may already be running before ps
even starts. Or, even if ps
starts first it will be writing into a 4k kernel pipe buffer and will eventually block (while printing a line of process output) until grep
starts up and begins consuming the data in the pipe. In the latter case if ps
is able to start and finish before grep
even starts you may not see the grep cron
in the output. You may have noticed this non-determinism at play already.
In your command
ps -ef | grep 'cron'
Linux is executing the "grep" command before the ps -ef command. Linux then maps the standard output (STDOUT) of "ps -ef" to the standard input (STDIN) of the grep command.
It does not execute the ps command, store the result in memory, and them pass it to grep. Think about that, why would it? Imagine if you were piping a hundred gigabytes of data?
Edit In regards to your second question:
In grep (and most regular expression engines), you can specify brackets to let it know that you'll accept ANY character in the brackets. So writing [c] means it will accept any charcter, but only c is specified. Similarly, you could do any other combination of characters.
ps aux | grep cronroot 1079 0.0 0.0 18976 1032 ? Ss Mar08 0:00 cronroot 23744 0.0 0.0 14564 900 pts/0 S+ 21:13 0:00 grep --color=auto cron
^ That matches itself, because your own command contains "cron"
ps aux | grep [c]ronroot 1079 0.0 0.0 18976 1032 ? Ss Mar08 0:00 cron
That matches cron, because cron contains a c, and then "ron". It does not match your request though, because your request is [c]ron
You can put whatever you want in the brackets, as long as it contains the c:
ps aux | grep [cbcdefadq]ronroot 1079 0.0 0.0 18976 1032 ? Ss Mar08 0:00 cron
If you remove the C, it won't match though, because "cron", starts with a c:
ps aux | grep [abedf]ron
^ Has no results
Edit 2
To reiterate the point, you can do all sorts of crazy stuff with grep. There's no significance in picking the first character to do this with.
ps aux | grep [c][ro][ro][n]root 1079 0.0 0.0 18976 1032 ? Ss Mar08 0:00 cron