Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y
hdfs dfs -du -h <PATH> | awk '{print $1$2,$3}' | sort -hr
Short explanation:
- The
hdfs
command gets the input data. - The
awk
only prints the first three fields with a comma in between the 2nd and 3rd. - The
-h
ofsort
compares human readable numbers like2K
or4G
, while the-r
reverses the sort order.
This is a rather old question, but stumbled across it while trying to do the same thing. As you were providing the -h (human readable flag) it was converting the sizes to different units to make it easier for a human to read. By leaving that flag off we get the aggregate summary of file lengths (in bytes).
sudo -u hdfs hadoop fs -du -s '/*' | sort -nr
Not as easy to read but means you can sort it correctly.
See https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#du for more details.
hdfs dfs -du -h <PATH> | sed 's/ //' | sort -hr
sed
will strip out the space between the number and the unit, after which sort
will be able to understand it.