how to remove all files from hdfs location except one? how to remove all files from hdfs location except one? shell shell

how to remove all files from hdfs location except one?


A best option would be to copy specific file to some other directory and delete all the remaining files in target directory and then move specific file to the same directory.

Else, There are couple of other ways as well to do the same thing.

Below is one sample shell script to delete all the files expect one matching pattern.

#!/bin/bashecho "Executing the shell script"for file in $(hadoop fs -ls /user/xxxx/dev/hadoop/external/csvfiles |grep -v 'a_file_pattern_to_search' | awk '{print $8}')do    printf '\n' >> "$file"    hadoop fs -rm "$file"doneecho "shell scripts ends"

List all the files and then using grep with -v option which get all the files other than your specific pattern or a filename.


Using the following code i'am able to remove all files from hdfs location at once except the file which is needed.

file_arr=()for file in $(hadoop fs -ls /tmp/table_name/ | grep -v 'part-' | awk '{print $8}')do    file_arr+=("$file")donehadoop fs -rm "${file_arr[@]}"  


I came up with a solution following vikrant rana's one. It does not require rm command to execute multiple times, and also doesn't need to store the files in any array, reducing loc and efforts:

hadoop fs -ls /user/xxxx/dev/hadoop/external/csvfiles| grep -v 'a_file_pattern_to_search'  | awk '{print $8}' | xargs hadoop fs -rm