Not able to parse input using KeyValueTextInputFormat in hadoop mapreduce
If you are using the new API (hadoop 2.x), I see from the API that the correct parameter to set is mapreduce.input.keyvaluelinerecordreader.key.value.separator
.
I.e., use mapreduce
, instead of mapred
.
UPDATE: It could also be that the delimiter ':' appears more than once in your input. For example, if an input record is key1: : value1 value2 value3
, then you would get something like what you describe in your question. If such is the case, then you should choose the delimiter properly, so that it appears exactly once.
How to change the default key-value output seperator in Hadoop MapReduce
For KeyValueTextInputFormat the input line should be a key value pair seperated by "\t"
Key1 Value1,Value2
By changing default seperator, You will be able to read as you wish.
For New Api
//New APIConfiguration conf = new Configuration();conf.set("key.value.separator.in.input.line", ","); Job job = new Job(conf);job.setInputFormatClass(KeyValueTextInputFormat.class);