Running Pig query over data stored in Hive Running Pig query over data stored in Hive hadoop hadoop

Running Pig query over data stored in Hive


Here's what I found out:Using HiveColumnarLoader makes sense if you store data as a RCFile. To load table using this you need to register some jars first:

register /srv/pigs/piggybank.jarregister /usr/lib/hive/lib/hive-exec-0.5.0.jarregister /usr/lib/hive/lib/hive-common-0.5.0.jara = LOAD '/user/hive/warehouse/table' USING org.apache.pig.piggybank.storage.HiveColumnarLoader('ts int, user_id int, url string');

To load data from Sequence file you have to use PiggyBank (as in previous example). SequenceFile loader from Piggybank should handle compressed files:

register /srv/pigs/piggybank.jarDEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();a = LOAD '/user/hive/warehouse/table' USING SequenceFileLoader AS (int, int);

This doesn't work with Pig 0.7 because it's unable to read BytesWritable type and cast it to Pig type and you get this exception:

2011-07-01 10:30:08,589 WARN org.apache.pig.piggybank.storage.SequenceFileLoader: Unable to translate key class org.apache.hadoop.io.BytesWritable to a Pig datatype2011-07-01 10:30:08,625 WARN org.apache.hadoop.mapred.Child: Error running childorg.apache.pig.backend.BackendException: ERROR 0: Unable to translate class org.apache.hadoop.io.BytesWritable to a Pig datatype    at org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78)    at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:132)    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:142)    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:448)    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)    at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:396)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)    at org.apache.hadoop.mapred.Child.main(Child.java:211)

How to compile piggybank is described here: Unable to build piggybank -> /home/build/ivy/lib does not exist