Not able to parse input using KeyValueTextInputFormat in hadoop mapreduce Not able to parse input using KeyValueTextInputFormat in hadoop mapreduce hadoop hadoop

Not able to parse input using KeyValueTextInputFormat in hadoop mapreduce


If using Hadoop 2.x, parameter is

mapreduce.input.keyvaluelinerecordreader.key.value.separator

Can you share a sample of your input data??


If you are using the new API (hadoop 2.x), I see from the API that the correct parameter to set is mapreduce.input.keyvaluelinerecordreader.key.value.separator.

I.e., use mapreduce, instead of mapred.

UPDATE: It could also be that the delimiter ':' appears more than once in your input. For example, if an input record is key1: : value1 value2 value3, then you would get something like what you describe in your question. If such is the case, then you should choose the delimiter properly, so that it appears exactly once.


How to change the default key-value output seperator in Hadoop MapReduce

For KeyValueTextInputFormat the input line should be a key value pair seperated by "\t"

Key1     Value1,Value2

By changing default seperator, You will be able to read as you wish.

For New Api

Here is the solution

//New APIConfiguration conf = new Configuration();conf.set("key.value.separator.in.input.line", ","); Job job = new Job(conf);job.setInputFormatClass(KeyValueTextInputFormat.class);