Get the last updated folder in HDFS Get the last updated folder in HDFS unix unix

Get the last updated folder in HDFS


With Hadoop 2.6, I could get it work with the following command:

hdfs dfs -ls -R ${DIR} | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8

where,

hdfs dfs -ls -R ${DIR} : gives all dirs recursively

grep "^d" : gives only directories

sort -k6,7 : sorts them by modification time

tail -1 : gives listing for last modified directory

tr -s ' ' : some formatting

cut -d' ' -f8 : gives only directory path

Example:

[user@nodeX]$ hdfs dfs -ls -R /tmp/a drwxr-xr-x   - hduser supergroup          0 2017-08-08 03:08 /tmp/a/bdrwxr-xr-x   - hduser supergroup          0 2017-08-08 03:11 /tmp/a/b/cdrwxr-xr-x   - hduser supergroup          0 2017-08-08 03:12 /tmp/a/b/c/CC-rw-r--r--   3 hduser supergroup          0 2017-08-08 03:12 /tmp/a/b/c/CC/f2.txtdrwxr-xr-x   - hduser supergroup          0 2017-08-08 03:08 /tmp/a/b/c/ddrwxr-xr-x   - hduser supergroup          0 2017-08-08 03:08 /tmp/a/b/c/d/e-rw-r--r--   3 hduser supergroup          6 2017-08-08 03:10 /tmp/a/b/c/f1.txt

Solution:

[user@nodeX]$ hdfs dfs -ls -R /tmp/a | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8/tmp/a/b/c/CC