Split a file into no of small files in HDFS
A simple Hadoop Streaming job with the input format as NLineInputFormat
can get this done.
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-<version>.jar \ -Dmapreduce.input.lineinputformat.linespermap=10 \ -Dmapreduce.job.reduces=0 \ -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -input /test.txt \ -output /splitted_output
Here the property mapreduce.input.lineinputformat.linespermap
determine the number of lines each split must contain.