Hadoop setInputPathFilter error Hadoop setInputPathFilter error hadoop hadoop

Hadoop setInputPathFilter error


Alternatively, you may try to loop through all of the files in the given directory and check if the file names begin with train. E.g:

        Job job = new Job(conf, "myJob");        List<Path> inputhPaths = new ArrayList<Path>();        String basePath = "/user/hadoop/path";        FileSystem fs = FileSystem.get(conf);        FileStatus[] listStatus = fs.globStatus(new Path(basePath + "/train*"));        for (FileStatus fstat : listStatus) {            inputhPaths.add(fstat.getPath());        }        FileInputFormat.setInputPaths(job,                (Path[]) inputhPaths.toArray(new Path[inputhPaths.size()]));


A quick fix, You can blacklist paths instead of whitelisting like return false if path contains "test"


You can get a FileSystem instance by having your Filter implement the Configurable interface (or extend the Configured class), and create a fileSystem instance variable in the setConf method:

class TrainFilter extends Configured implements PathFilter{   FileSystem fileSystem;   boolean accept(Path path)   {      // TODO: use fileSystem here to determine if path is a directory      return path.toString().contains("train");   }   public void setConf(Configuration conf) {     if (conf != null) {       fileSystem = FileSystem.get(conf);     }   }}