Can I use HCatInputFormat with MultipleInputs in Hadoop? Can I use HCatInputFormat with MultipleInputs in Hadoop? hadoop hadoop

Can I use HCatInputFormat with MultipleInputs in Hadoop?


HCatMultipleInputs can be used for reading multiple hive tables.

Here is a patch (for 0.13) that we can look at installing for multiple table support. It has HCatMultipleInputs to support multiple hive tables.

https://issues.apache.org/jira/i#browse/HIVE-4997

 Example useage:HCatMultipleInputs.addInput(job,Table1, db1, properites1, Mapper1.class);

You can use the working code in the below link:https://github.com/abhirj87/training/tree/master/multipleinputs


The solution here apparently is either upgrade to 0.14.0 (or patch the old version) or not use HCatalog but read the metastore directly and manually add each partition subdirectory to MultipleInputs.

Personally since I can't upgrade easily and the subpartitioning is too much work, I just focused on optimising the jobs in other ways and be contempt with running a sequence of jobs for now.


Is there a way to implement the patch alone in a seperate mapreduce program. It seems that the Patch is still not committed, but i want to use the solution in my job.