How to read a JSON column as a field in Apache PIG How to read a JSON column as a field in Apache PIG hadoop hadoop

How to read a JSON column as a field in Apache PIG


I think you need to use Twitter's Elephant Bird to parse a single json column in Pig. (If you wanted to parse files that are json-only, you could simply use Pig's JsonLoader API).

Here is a related question - it looks like your json is also an array, so what's written there will apply for you, too.

In case that doesn't work, here's a blog post describing how to write a Python UDF for a more specific case of JSON parsing. You can of course do the same thing with a Java UDF.