How to sort column A uniquely based on descending order of column B in unix/per/tcl?
awk to the rescue!
$ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++'Column A, Column Bcat,40dog,10elephant,37
if you want your specific output it needs little more coding because of the header line.
$ sort -t, -k1,1 -k2nr filename | awk -F, 'NR==1{print "999999\t"$0;next} !a[$1]++{print $2"\t"$0}' | sort -k1nr | cut -f2-Column A, Column Bcat,40elephant,37dog,10
Another alternative with removing header upfront and adding it back at the end
$ h=$(head -1 filename); sed 1d filename | sort -t, -k1,1 -k2nr | awk -F, '!a[$1]++' | sort -t, -k2nr | sed '1i'"$h"''
Perlishly:
#!/usr/bin/env perluse strict;use warnings;#print header rowprint scalar <>;my %seen;#iterate the magic filehandle (file specified on command line or #stdin - e.g. like grep/sed)while (<>) { chomp; #strip trailing linefeed #split this line on ',' my ( $key, $value ) = split /,/; #save this value if previous is lower or non existant if ( not defined $seen{$key} or $seen{$key} < $value ) { $seen{$key} = $value; }}#sort, comparing values in %seen foreach my $key ( sort { $seen{$b} <=> $seen{$a} } keys %seen ) { print "$key,$seen{$key}\n";}
I've +1'd karakfa's answer. It's simple and elegant.
My answer is an extension of karakfa's header handling. If you like it, please feel free to +1 my answer, but "best answer" should go to karakfa. (Unless of course you prefer one of the other answer! :] )
If your input is as you've described in your question, then we can recognize the header by seeing that $2 is not numeric. Thus, the following does not take the header into consideration:
$ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++'
You might alternately strip the header with:
$ sort -t, -k1,1 -k2,2nr filename | awk -F, '$2~/^[0-9]+$/&&!a[$1]++'
This slows things down quite a bit, since a regex may take longer to evaluate than a simple array assignment and numeric test. I'm using a regex for the numeric test in order to permit a 0
, which would otherwise evaluate to "false".
Next, if you want to keep the header, but print it first, you can process your output at the end of the stream:
$ sort -t, -k1,1 -k2,2nr filename | awk -F, '$2!~/^[0-9]+$/{print;next} !a[$1]++{b[$1]=$0} END{for(i in b){print b[i]}}'
Last option to achieve the same effect without storing the extra array in memory would be to process your input a second time. This is more costly in terms of IO, but less costly in terms of memory:
$ sort -t, -k1,1 -k2,2nr filename | awk -F, 'NR==FNR&&$2!~/^[0-9]+$/{print;nextfile} $2~/^[0-9]+$/&&!a[$1]++' filename -