how to load twitter data from hdfs using pig?
You need to register below jar in pig, this jar contains the appropriate class which you are trying to access.
elephant-bird-pig-4.1.jar
EDITED: For proper steps.
REGISTER '/home/hdfs/json-simple-1.1.jar';REGISTER '/home/hdfs/elephant-bird-hadoop-compat-4.1.jar';REGISTER '/home/hdfs/elephant-bird-pig-4.1.jar';load_tweets = LOAD '/user/hdfs/twittes.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;dump load_tweets;
I used above steps on my local cluster and its working fine, so you need to add these jars before running your load.
You need to Register 3 Jar files as shown in the blog. Each jar has its own importance.
elephant-bird-hadoop-compat-4.1.jar-Utilities for dealing with Hadoop incompatibilities between 1.x and 2.x.
elephant-bird-pig-4.1.jar--Json loader for pig, it loads each Json record into Pig.
json-simple-1.1.1.jar--One of the Json Parser available in Java
After Registering the Jars, you can load the tweets by the following pig script.
load_tweets = LOAD '/user/flume/tweets/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
After loading the tweets, you can see them by dumping it
dump load_tweets