The way to check a HDFS directory's size? The way to check a HDFS directory's size? hadoop hadoop

The way to check a HDFS directory's size?


Prior to 0.20.203, and officially deprecated in 2.6.0:

hadoop fs -dus [directory]

Since 0.20.203 (dead link) 1.0.4 and still compatible through 2.6.0:

hdfs dfs -du [-s] [-h] URI [URI …]

You can also run hadoop fs -help for more info and specifics.


hadoop fs -du -s -h /path/to/dir displays a directory's size in readable form.


Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.

Options:

  • The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.
  • The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)
  • The -v option will display the names of columns as a header line.
  • The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

du returns three columns with the following format:

 +-------------------------------------------------------------------+  | size  |  disk_space_consumed_with_all_replicas  |  full_path_name |  +-------------------------------------------------------------------+ 

Example command:

hadoop fs -du /user/hadoop/dir1 \    /user/hadoop/file1 \    hdfs://nn.example.com/user/hadoop/dir1 

Exit Code: Returns 0 on success and -1 on error.

source: Apache doc