Using Hive with Pig through HCatalog issue with TimeStamp datatype Using Hive with Pig through HCatalog issue with TimeStamp datatype hadoop hadoop

Using Hive with Pig through HCatalog issue with TimeStamp datatype


If you are using hive 0.13 or later, instead of

A = LOAD 'dbname.tablename' USING org.apache.hcatalog.pig.HCatLoader();

do

A = LOAD 'dbname.tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

org.apache.hcatalog.pig.HCatLoader is now deprecated. The new class supports the pig datetime type and will convert from hive appropriately.

Note, however, that there will be data loss when using HCatLoader since these represent times in different ways, since pig datetime represents dates to millisecond precision and hive represents to nanosecond.

eg. Hive - > Pig will result in nanoseconds being lost from the timestamp (converted to the nearest millisecond!)


As per the Hive, Pig and HCatalog version that you are using, timestamp is not supported. And there is no way that you can directly load it into Pig using HCatalog from Hive.

There is a work around, if you create a temp hive table and change the datatype from timestamp to string. This way you will be able to load it into Pig as chararray. Once you have your data loaded in Pig, you can always typecast it.


It will be supported under hive 0.13, they have an issue about this problem that was already solved, you can see the issue in https://issues.apache.org/jira/browse/HIVE-5814

org.apache.hcatalog.pig.HCatLoader has been deprecated in Hive 0.12.In fact every class in org.apache.hcatalog has been deprecated. All new features are added in org.apache.hive.hcatalog which contains all the classes/methods from org.apache.hcatalog and new APIs.