Cannot read bz2 compressed file in hadoop job

xml hadoop mapreduce

You should look at your core-site.xml configuration file and add a class for BZip2 codec if it's absent.Here is an example:

<property>    <name>io.compression.codecs</name>    <value>org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value></property>

Edit:

After adding codec please reproduce next steps to see that it works (your code may not):

hadoop fs -mkdir /tmp/wordcount/echo "three one three three seven" >> /tmp/wordsbzip2 -z /tmp/wordshadoop fs -put /tmp/words.bz2 /tmp/wordcount/hadoop jar /usr/lib/hadoop/hadoop-examples.jar wordcount /tmp/wordcount/ /tmp/wordcount_out/hadoop fs -text /tmp/wordcount_out/part*#you should see next three lines:one     1seven   1three   3#clean up#this commands may be different in your casehadoop fs -rmr /tmp/wordcount_out/hadoop fs -rmr /tmp/wordcount/

xml hadoop mapreduce

In your TextInputFormat implementation you're probably overriding createRecordReader and returning a custom implementation of RecordReader<KEYIN, VALUEIN> that doesn't take the codec into account. The default implementation returns a LineRecordReader that handles codecs correctly. You can find a reference implementation here, and the relevant changes required here.

CodeHunter

Cannot read bz2 compressed file in hadoop job

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last