How to make Hadoop MR to read only files instead of folders in input path How to make Hadoop MR to read only files instead of folders in input path hadoop hadoop

How to make Hadoop MR to read only files instead of folders in input path


One way to achieve this is to create custom input format by subclassing default InputFormat class, so that it will allow you to override the listStatus method. While implement the liststatus method you just need to ignore directories inside your input dir.

Example:

 for (int i = 0; i < len; ++i) {FileStatus file = files[i];if (!file.isDir()) {newFiles.add(file);

Hope that will help you.


Instead of using the root directory for the InputPath, you could use the path:OPFolder1/part-m*, which is basically all the files in this directory, whose names start with part-m.