Hadoop setInputPathFilter error
Alternatively, you may try to loop through all of the files in the given directory and check if the file names begin with train. E.g:
Job job = new Job(conf, "myJob"); List<Path> inputhPaths = new ArrayList<Path>(); String basePath = "/user/hadoop/path"; FileSystem fs = FileSystem.get(conf); FileStatus[] listStatus = fs.globStatus(new Path(basePath + "/train*")); for (FileStatus fstat : listStatus) { inputhPaths.add(fstat.getPath()); } FileInputFormat.setInputPaths(job, (Path[]) inputhPaths.toArray(new Path[inputhPaths.size()]));
You can get a FileSystem instance by having your Filter implement the Configurable interface (or extend the Configured class), and create a fileSystem instance variable in the setConf method:
class TrainFilter extends Configured implements PathFilter{ FileSystem fileSystem; boolean accept(Path path) { // TODO: use fileSystem here to determine if path is a directory return path.toString().contains("train"); } public void setConf(Configuration conf) { if (conf != null) { fileSystem = FileSystem.get(conf); } }}