Use elephant-bird with hive to read protobuf data Use elephant-bird with hive to read protobuf data hadoop hadoop

Use elephant-bird with hive to read protobuf data


The problem had been solved.

First I put protobuf binary data directly into HDFS, no result showed.

Because it doesn't work that way.

After asking some senior colleagues, they said protobuf binary data should be written into some kind of container, some file format, like hadoop SequenceFile etc.

The elephant-bird page had written the information too, but first I couldn't understand it completely.

After writing protobuf binary data into sequenceFile, I can read the protobuf data with hive.

And because I use sequenceFile format, so I use the create table syntax:

inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'

Hope it can help others who are new to hadoop, hive, elephant too.