Sort a text file by line length including spaces
Answer
cat testfile | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-
Or, to do your original (perhaps unintentional) sub-sorting of any equal-length lines:
cat testfile | awk '{ print length, $0 }' | sort -n | cut -d" " -f2-
In both cases, we have solved your stated problem by moving away from awk for your final cut.
Lines of matching length - what to do in the case of a tie:
The question did not specify whether or not further sorting was wanted for lines of matching length. I've assumed that this is unwanted and suggested the use of -s
(--stable
) to prevent such lines being sorted against each other, and keep them in the relative order in which they occur in the input.
(Those who want more control of sorting these ties might look at sort's --key
option.)
Why the question's attempted solution fails (awk line-rebuilding):
It is interesting to note the difference between:
echo "hello awk world" | awk '{print}'echo "hello awk world" | awk '{$1="hello"; print}'
They yield respectively
hello awk worldhello awk world
The relevant section of (gawk's) manual only mentions as an aside that awk is going to rebuild the whole of $0 (based on the separator, etc) when you change one field. I guess it's not crazy behaviour. It has this:
"Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:"
$1 = $1 # force record to be reconstituted print $0 # or whatever else with $0
"This forces awk to rebuild the record."
Test input including some lines of equal length:
aa A line with MORE spacesbb The very longest line in the fileccb9 dd equal len. Orig pos = 1500 dd equal len. Orig pos = 2cczccaee A line with some spaces1 dd equal len. Orig pos = 3ff5 dd equal len. Orig pos = 4g
The AWK solution from neillb is great if you really want to use awk
and it explains why it's a hassle there, but if what you want is to get the job done quickly and don't care what you do it in, one solution is to use Perl's sort()
function with a custom caparison routine to iterate over the input lines. Here is a one liner:
perl -e 'print sort { length($a) <=> length($b) } <>'
You can put this in your pipeline wherever you need it, either receiving STDIN (from cat
or a shell redirect) or just give the filename to perl as another argument and let it open the file.
In my case I needed the longest lines first, so I swapped out $a
and $b
in the comparison.