Split a file into no of small files in HDFS

A simple Hadoop Streaming job with the input format as NLineInputFormat can get this done.

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-<version>.jar \   -Dmapreduce.input.lineinputformat.linespermap=10 \   -Dmapreduce.job.reduces=0 \   -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \   -mapper org.apache.hadoop.mapred.lib.IdentityMapper \   -input /test.txt \   -output /splitted_output

Here the property mapreduce.input.lineinputformat.linespermap determine the number of lines each split must contain.

CodeHunter

Split a file into no of small files in HDFS

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last