Useless use of cat? Useless use of cat? shell shell

Useless use of cat?


I was not aware of the award until today when some rookie tried to pin the UUOC on me for one of my answers. It was a cat file.txt | grep foo | cut ... | cut .... I gave him a piece of my mind, and only after doing so visited the link he gave me referring to the origins of the award and the practice of doing so. Further searching led me to this question. Somewhat unfortunately despite conscious consideration, none of the answers included my rationale.

I had not meant to be defensive in responding to him. After all, in my younger years, I would have written the command as grep foo file.txt | cut ... | cut ... because whenever you do the frequent single greps you learn the placement of the file argument and it is ready knowledge that the first is the pattern and the later ones are file names.

It was a conscious choice to use cat when I answered the question, partly because of a reason of "good taste" (in the words of Linus Torvalds) but chiefly for a compelling reason of function.

The latter reason is more important so I will put it out first. When I offer a pipeline as a solution I expect it to be reusable. It is quite likely that a pipeline would be added at the end of or spliced into another pipeline. In that case having a file argument to grep screws up reusability, and quite possibly do so silently without an error message if the file argument exists. I. e. grep foo xyz | grep bar xyz | wc will give you how many lines in xyz contain bar while you are expecting the number of lines that contain both foo and bar. Having to change arguments to a command in a pipeline before using it is prone to errors. Add to it the possibility of silent failures and it becomes a particularly insidious practice.

The former reason is not unimportant either since a lot of "good taste" merely is an intuitive subconscious rationale for things like the silent failures above that you cannot think of right at the moment when some person in need of education says "but isn't that cat useless".

However, I will try to also make conscious the former "good taste" reason I mentioned. That reason has to do with the orthogonal design spirit of Unix. grep does not cut and ls does not grep. Therefore at the very least grep foo file1 file2 file3 goes against the design spirit. The orthogonal way of doing it is cat file1 file2 file3 | grep foo. Now, grep foo file1 is merely a special case of grep foo file1 file2 file3, and if you do not treat it the same you are at least using up brain clock cycles trying to avoid the useless cat award.

That leads us to the argument that grep foo file1 file2 file3 is concatenating, and cat concatenates so it is proper to cat file1 file2 file3 but because cat is not concatenating in cat file1 | grep foo therefore we are violating the spirit of both the cat and the almighty Unix. Well, if that were the case then Unix would need a different command to read the output of one file and spit it to stdout (not paginate it or anything just a pure spit to stdout). So you would have the situation where you say cat file1 file2 or you say dog file1 and conscientiously remember to avoid cat file1 to avoid getting the award, while also avoiding dog file1 file2 since hopefully the design of dog would throw an error if multiple files are specified.

Hopefully, at this point, you sympathize with the Unix designers for not including a separate command to spit a file to stdout, while also naming cat for concatenate rather than giving it some other name. <edit> removed incorrect comments on <, in fact, < is an efficient no-copy facility to spit a file to stdout which you can position at the beginning of a pipeline so the Unix designers did include something specifically for this </edit>

The next question is why is it important to have commands that merely spit a file or the concatenation of several files to stdout, without any further processing? One reason is to avoid having every single Unix command that operates on standard input to know how to parse at least one command line file argument and use it as input if it exists. The second reason is to avoid users having to remember: (a) where the filename arguments go; and (b) avoid the silent pipeline bug as mentioned above.

That brings us to why grep does have the extra logic. The rationale is to allow user-fluency for commands that are used frequently and on a stand-alone basis (rather than as a pipeline). It is a slight compromise of orthogonality for a significant gain in usability. Not all commands should be designed this way and commands that are not frequently used should completely avoid the extra logic of file arguments (remember extra logic leads to unnecessary fragility (the possibility of a bug)). The exception is to allow file arguments like in the case of grep. (By the way, note that ls has a completely different reason to not just accept but pretty much require file arguments)

Finally, what could have been done better is if such exceptional commands as grep (but not necessarily ls) generate an error if the standard input is also available when file arguments are specified.


Nope!

First of all, it doesn't matter where in a command the redirection happens. So if you like your redirection to the left of your command, that's fine:

< somefile command

is the same as

command < somefile

Second, there are n + 1 processes and a subshell happening when you use a pipe. It is most decidedly slower. In some cases n would've been zero (for example, when you're redirecting to a shell builtin), so by using cat you're adding a new process entirely unnecessarily.

As a generalization, whenever you find yourself using a pipe it's worth taking 30 seconds to see if you can eliminate it. (But probably not worth taking much longer than 30 seconds.) Here are some examples where pipes and processes are frequently used unnecessarily:

for word in $(cat somefile); … # for word in $(<somefile); … (or better yet, while read < somefile)grep something | awk stuff; # awk '/something/ stuff' (similar for sed)echo something | command; # command <<< something (although echo would be necessary for pure POSIX)

Feel free to edit to add more examples.


In defense of cat:

Yes,

   < input process > output 

or

   process < input > output 

is more efficient, but many invocations don't have performance issues, so you don't care.

ergonomic reasons:

We are used to read from left to right, so a command like

    cat infile | process1 | process2 > outfile

is trivial to understand.

    process1 < infile | process2 > outfile

has to jump over process1, and then read left to right. This can be healed by:

    < infile process1 | process2 > outfile

looks somehow, as if there were an arrow pointing to the left, where nothing is. More confusing and looking like fancy quoting is:

    process1 > outfile < infile

and generating scripts is often an iterative process,

    cat file     cat file | process1    cat file | process1 | process2     cat file | process1 | process2 > outfile

where you see your progress stepwise, while

    < file 

not even works. Simple ways are less error prone and ergonomic command catenation is simple with cat.

Another topic is, that most people were exposed to > and < as comparison operators, long before using a computer and when using a computer as programmers, are far more often exposed to these as such.

And comparing two operands with < and > is contra commutative, which means

(a > b) == (b < a)

I remember the first time using < for input redirection, I feared

a.sh < file 

could mean the same as

file > a.sh

and somehow overwrite my a.sh script. Maybe this is an issue for many beginners.

rare differences

wc -c journal.txt15666 journal.txtcat journal.txt | wc -c 15666

The latter can be used in calculations directly.

factor $(cat journal.txt | wc -c)

Of course the < can be used here too, instead of a file parameter:

< journal.txt wc -c 15666wc -c < journal.txt15666    

but who cares - 15k?

If I would run occasionally into issues, surely I would change my habit of invocing cat.

When using very large or many, many files, avoiding cat is fine. To most questions the use of cat is orthogonal, off topic, not an issue.

Starting these useless useless use of cat discussion on every second shell topic is only annoying and boring. Get a life and wait for your minute of fame, when dealing with performance questions.