How to sort column A uniquely based on descending order of column B in unix/per/tcl? How to sort column A uniquely based on descending order of column B in unix/per/tcl? unix unix

How to sort column A uniquely based on descending order of column B in unix/per/tcl?


awk to the rescue!

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++'Column A, Column Bcat,40dog,10elephant,37

if you want your specific output it needs little more coding because of the header line.

$ sort -t, -k1,1 -k2nr filename | awk -F, 'NR==1{print "999999\t"$0;next} !a[$1]++{print $2"\t"$0}' | sort -k1nr | cut -f2-Column A, Column Bcat,40elephant,37dog,10

Another alternative with removing header upfront and adding it back at the end

$ h=$(head -1 filename); sed 1d filename | sort -t, -k1,1 -k2nr | awk -F, '!a[$1]++' | sort -t, -k2nr | sed '1i'"$h"''


Perlishly:

#!/usr/bin/env perluse strict;use warnings;#print header rowprint scalar <>;my %seen;#iterate the magic filehandle (file specified on command line or #stdin - e.g. like grep/sed)while (<>) {    chomp; #strip trailing linefeed    #split this line on ','    my ( $key, $value ) = split /,/;    #save this value if previous is lower or non existant    if ( not defined $seen{$key}        or $seen{$key} < $value )    {        $seen{$key} = $value;    }}#sort, comparing values in %seen foreach my $key ( sort { $seen{$b} <=> $seen{$a} } keys %seen ) {    print "$key,$seen{$key}\n";}


I've +1'd karakfa's answer. It's simple and elegant.

My answer is an extension of karakfa's header handling. If you like it, please feel free to +1 my answer, but "best answer" should go to karakfa. (Unless of course you prefer one of the other answer! :] )

If your input is as you've described in your question, then we can recognize the header by seeing that $2 is not numeric. Thus, the following does not take the header into consideration:

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '!a[$1]++'

You might alternately strip the header with:

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '$2~/^[0-9]+$/&&!a[$1]++'

This slows things down quite a bit, since a regex may take longer to evaluate than a simple array assignment and numeric test. I'm using a regex for the numeric test in order to permit a 0, which would otherwise evaluate to "false".

Next, if you want to keep the header, but print it first, you can process your output at the end of the stream:

$ sort -t, -k1,1 -k2,2nr filename | awk -F, '$2!~/^[0-9]+$/{print;next} !a[$1]++{b[$1]=$0} END{for(i in b){print b[i]}}'

Last option to achieve the same effect without storing the extra array in memory would be to process your input a second time. This is more costly in terms of IO, but less costly in terms of memory:

$ sort -t, -k1,1 -k2,2nr filename | awk -F, 'NR==FNR&&$2!~/^[0-9]+$/{print;nextfile} $2~/^[0-9]+$/&&!a[$1]++' filename -