Hadoop writing to a new file from mapper

java hadoop mapreduce hdfs reduce

Are you sure your are using a single mapper? Because Hadoop creates a number of mappers very close to the number of input splits (more details).

The concept of input split is very important as well: it means very big data files are splited into several chuncks, each chunck assigned to a mapper. Thus, unless you are totally sure only one mapper is being used, you wont be able to control which part of the file you are workin on, and you will not be able to control any kind of global index.

Being said that, by using a single mapper in MapReduce is the same than not using MapReduce at all :) Maybe the mistake is mine, and I'm assuming you have only one file to be analyzed, is that the case?

In the case you have several big data files the scenario changes, and it could make sense to create a single mapper for each file, but you will have to create your own InputSplit and override the isSplitable method by returning always false.

CodeHunter

Hadoop writing to a new file from mapper

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last