merge output files after reduce phase

hadoop mapreduce

Instead of doing the file merging on your own, you can delegate the entire merging of the reduce output files by calling:

hadoop fs -getmerge /output/dir/on/hdfs/ /desired/local/output/file.txt

Note This combines the HDFS files locally. Make sure you have enough disk space before running

hadoop mapreduce

No, these files are not merged by Hadoop. The number of files you get is the same as the number of reduce tasks.

If you need that as input for a next job then don't worry about having separate files. Simply specify the entire directory as input for the next job.

If you do need the data outside of the cluster then I usually merge them at the receiving end when pulling the data off the cluster.

I.e. something like this:

hadoop fs -cat /some/where/on/hdfs/job-output/part-r-* > TheCombinedResultOfTheJob.txt

hadoop mapreduce

That's the function you can use to Merge Files in HDFS

public boolean getMergeInHdfs(String src, String dest) throws IllegalArgumentException, IOException {    FileSystem fs = FileSystem.get(config);    Path srcPath = new Path(src);    Path dstPath = new Path(dest);    // Check if the path already exists    if (!(fs.exists(srcPath))) {        logger.info("Path " + src + " does not exists!");        return false;    }    if (!(fs.exists(dstPath))) {        logger.info("Path " + dest + " does not exists!");        return false;    }    return FileUtil.copyMerge(fs, srcPath, fs, dstPath, false, config, null);}

CodeHunter

merge output files after reduce phase

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last