Get the last updated folder in HDFS
With Hadoop 2.6, I could get it work with the following command:
hdfs dfs -ls -R ${DIR} | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8
where,
hdfs dfs -ls -R ${DIR}
: gives all dirs recursively
grep "^d"
: gives only directories
sort -k6,7
: sorts them by modification time
tail -1
: gives listing for last modified directory
tr -s ' '
: some formatting
cut -d' ' -f8
: gives only directory path
Example:
[user@nodeX]$ hdfs dfs -ls -R /tmp/a drwxr-xr-x - hduser supergroup 0 2017-08-08 03:08 /tmp/a/bdrwxr-xr-x - hduser supergroup 0 2017-08-08 03:11 /tmp/a/b/cdrwxr-xr-x - hduser supergroup 0 2017-08-08 03:12 /tmp/a/b/c/CC-rw-r--r-- 3 hduser supergroup 0 2017-08-08 03:12 /tmp/a/b/c/CC/f2.txtdrwxr-xr-x - hduser supergroup 0 2017-08-08 03:08 /tmp/a/b/c/ddrwxr-xr-x - hduser supergroup 0 2017-08-08 03:08 /tmp/a/b/c/d/e-rw-r--r-- 3 hduser supergroup 6 2017-08-08 03:10 /tmp/a/b/c/f1.txt
Solution:
[user@nodeX]$ hdfs dfs -ls -R /tmp/a | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8/tmp/a/b/c/CC