Hadoop fs -get copy only specific files Hadoop fs -get copy only specific files hadoop hadoop

Hadoop fs -get copy only specific files


Here's how we did it. Just wrote a quick shell script.

LOCAL_DIR=/tmp/txtmkdir $LOCAL_DIRfor F in `hadoop fs -fs hdfs://namenode.mycluster -lsr / | grep '/*.txt$' | awk '{print $NF}'; do   hadoop fs -fs hdfs://namenode.mycluster -copyToLocal $F $LOCAL_DIR done`


You can give regular expression to copy files. there is an example here to use command line in hadoop. This does not uses get, but it uses put, which should behave same as get.

Something like this : hadoop fs -get out/*

http://prazjain.wordpress.com/2012/02/15/how-to-run-hadoop-map-reduce-program-from-command-line/


Hadoop doesn't support the double-star glob notation in paths, so there is no out-of the box way of doing this:

hadoop fs -get /**/*.txt /tmp

You can however write your own code to do it - look into the current source for FsShell, and pair that with FileInputFormat's listStatus method - which can be configured to accept a PathFilter. In this PathFilter you can only return true if the Path is of the file type you desire.