Retrieving data from twitter using flume and storing to hdfs in JSON FORMAT Retrieving data from twitter using flume and storing to hdfs in JSON FORMAT hadoop hadoop

Retrieving data from twitter using flume and storing to hdfs in JSON FORMAT


To change from Avro to JSON format you have to follow few steps:

In your config file change the property

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource

to

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource

com.cloudera.flume.source.TwitterSource is a custom class which writes the record in JSON format in HDFS.

To get this class you go to https://github.com/cloudera/cdh-twitter-example and download flume-sources folder to your local and make the jar file from it.

  1. To build the flume-sources JAR:

    $ cd hive-serdes
    $ mvn package
    $ cd ..

This will generate a file called flume-sources-1.0-SNAPSHOT.jar in the target directory.

  1. Add the JAR to the Flume classpath

Copy flume-sources-1.0-SNAPSHOT.jar to /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar and also to /var/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar

if those directories do not exist, then create as

sudo mkdir -p /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/sudo mkdir -p /var/lib/flume-ng/plugins.d/twitter-streaming/lib/

For more please refer to Analyzing Twitter Data Using CDH

Hope this help you!!!


The events from TwitterSource from Flume are in Avro format by default. To change that you would have to modify the source files of the TwitterSource to get the tweets in raw format (json). Fortunately, Cloudera already did that in here https://github.com/cloudera/cdh-twitter-example

All you have to do is install the libraries for a new TwitterSource following the steps in the readme and change the TwitterAgent.sources.Twitter.type in the Flume config file to com.cloudera.flume.source.TwitterSource. There is an example of the config file in the same project.

Hope it helps