Sort a text file by line length including spaces Sort a text file by line length including spaces bash bash

Sort a text file by line length including spaces


Answer

cat testfile | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-

Or, to do your original (perhaps unintentional) sub-sorting of any equal-length lines:

cat testfile | awk '{ print length, $0 }' | sort -n | cut -d" " -f2-

In both cases, we have solved your stated problem by moving away from awk for your final cut.

Lines of matching length - what to do in the case of a tie:

The question did not specify whether or not further sorting was wanted for lines of matching length. I've assumed that this is unwanted and suggested the use of -s (--stable) to prevent such lines being sorted against each other, and keep them in the relative order in which they occur in the input.

(Those who want more control of sorting these ties might look at sort's --key option.)

Why the question's attempted solution fails (awk line-rebuilding):

It is interesting to note the difference between:

echo "hello   awk   world" | awk '{print}'echo "hello   awk   world" | awk '{$1="hello"; print}'

They yield respectively

hello   awk   worldhello awk world

The relevant section of (gawk's) manual only mentions as an aside that awk is going to rebuild the whole of $0 (based on the separator, etc) when you change one field. I guess it's not crazy behaviour. It has this:

"Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:"

 $1 = $1   # force record to be reconstituted print $0  # or whatever else with $0

"This forces awk to rebuild the record."

Test input including some lines of equal length:

aa A line   with     MORE    spacesbb The very longest line in the fileccb9   dd equal len.  Orig pos = 1500 dd equal len.  Orig pos = 2cczccaee A line with  some       spaces1   dd equal len.  Orig pos = 3ff5   dd equal len.  Orig pos = 4g


The AWK solution from neillb is great if you really want to use awk and it explains why it's a hassle there, but if what you want is to get the job done quickly and don't care what you do it in, one solution is to use Perl's sort() function with a custom caparison routine to iterate over the input lines. Here is a one liner:

perl -e 'print sort { length($a) <=> length($b) } <>'

You can put this in your pipeline wherever you need it, either receiving STDIN (from cat or a shell redirect) or just give the filename to perl as another argument and let it open the file.

In my case I needed the longest lines first, so I swapped out $a and $b in the comparison.


Try this command instead:

awk '{print length, $0}' your-file | sort -n | cut -d " " -f2-