Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y

bash shell hadoop hadoop2 hadoop-partitioning

hdfs dfs -du -h <PATH> | awk '{print $1$2,$3}' | sort -hr

Short explanation:

The hdfs command gets the input data.
The awk only prints the first three fields with a comma in between the 2nd and 3rd.
The -h of sort compares human readable numbers like 2K or 4G, while the -r reverses the sort order.

bash shell hadoop hadoop2 hadoop-partitioning

This is a rather old question, but stumbled across it while trying to do the same thing. As you were providing the -h (human readable flag) it was converting the sizes to different units to make it easier for a human to read. By leaving that flag off we get the aggregate summary of file lengths (in bytes).

sudo -u hdfs hadoop fs -du -s '/*' | sort -nr

Not as easy to read but means you can sort it correctly.

See https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#du for more details.

bash shell hadoop hadoop2 hadoop-partitioning

hdfs dfs -du -h <PATH> | sed 's/ //' | sort -hr

sed will strip out the space between the number and the unit, after which sort will be able to understand it.

CodeHunter

Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last