Hive: parsing JSON Hive: parsing JSON hadoop hadoop

Hive: parsing JSON


You can use get_json_object:

 select get_json_object(fieldname, '$.country'),         get_json_object(fieldname, '$.data.ad.s') from ... 

You will get better performance with json_tuple but I found a "how to" to get the values in json inside json;To formating your table you can use something like this:

from table t lateral view explode( split(regexp_replace(get_json_object(ln, ''$.data.ad.s'), '\\[|\\]', ''), ',' ) ) tb1 as sthis code above will transform you "Array" in a column.

form more: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

I hope this help ...


Here is what you can quickly try , I would suggest to use Json-Ser-De.

nano /tmp/hive-parsing-json.json

{"country":"US","page":227,"data":{"ad":{"impressions":{"s":10,"o":10}}}}

Create base table :

hive > CREATE TABLE hive_parsing_json_table ( json string );

Load json file to Table :

hive > LOAD DATA LOCAL INPATH  '/tmp/hive-parsing-json.json' INTO TABLE hive_parsing_json_table;

Query the table :

hive >  select v1.Country, v1.Page, v4.impressions_s, v4.impressions_o from hive_parsing_json_table hpjp     LATERAL VIEW json_tuple(hpjp.json, 'country', 'page', 'data') v1     as Country, Page, data     LATERAL VIEW json_tuple(v1.data, 'ad') v2     as Ad     LATERAL VIEW json_tuple(v2.Ad, 'impressions') v3     as Impressions     LATERAL VIEW json_tuple(v3.Impressions, 's' , 'o') v4     as impressions_s,impressions_o;  

Output :

v1.country  v1.page     v4.impressions_s    v4.impressions_oUS      227     10          10


Using hive native json-serde('org.apache.hive.hcatalog.data.JsonSerDe') you can do this.. here are the steps

ADD JAR /path/to/hive-hcatalog-core.jar;

create a table as below  CREATE TABLE json_serde_nestedjson (  country string,  page int,  data struct < ad: struct < impressions: struct < s:int, o:int  > > >)ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

then load data(stored in file)

LOAD DATA LOCAL INPATH '/tmp/nested.json' INTO TABLE json_serde_nestedjson;

then get required data using

SELECT country, page, data.ad.impressions.s, data.ad.impressions.o FROM json_serde_nestedjson;