Can I use HCatInputFormat with MultipleInputs in Hadoop?
HCatMultipleInputs can be used for reading multiple hive tables.
Here is a patch (for 0.13) that we can look at installing for multiple table support. It has HCatMultipleInputs to support multiple hive tables.
https://issues.apache.org/jira/i#browse/HIVE-4997
Example useage:HCatMultipleInputs.addInput(job,Table1, db1, properites1, Mapper1.class);
You can use the working code in the below link:https://github.com/abhirj87/training/tree/master/multipleinputs
The solution here apparently is either upgrade to 0.14.0 (or patch the old version) or not use HCatalog but read the metastore directly and manually add each partition subdirectory to MultipleInputs.
Personally since I can't upgrade easily and the subpartitioning is too much work, I just focused on optimising the jobs in other ways and be contempt with running a sequence of jobs for now.