How to load xml file into Hive
You have several options:
- Load the XML into a Hive table with a string column, one per row (e.g.
CREATE TABLE xmlfiles (id int, xmlfile string)
. Then use an XPath UDF to do work on the XML. - Since you know the XPath's of what you want (e.g.
//section1
), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath. - Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
- Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.
It depends on your level of experience and comfort with these approaches.