Can I use mrjob python library on partitioned hive tables? Can I use mrjob python library on partitioned hive tables? hadoop hadoop

Can I use mrjob python library on partitioned hive tables?


As Alex stated currently Mr.Job does not work with avro formated files. However, there is a way to perform python code on hive tables directly (no Mr.Job needed, unfortunatelly with loss of flexibility). Eventually, I managed to add python file as a resource to hive by executing "ADD FILE mapper.py" and performing SELECT clause with TRANSFORM ... USING ...., storing the results of a mapper in a separate table. Example Hive query:

INSERT OVERWRITE TABLE u_data_newSELECT TRANSFORM (userid, movieid, rating, unixtime) USING 'python weekday_mapper.py' AS (userid, movieid, rating, weekday)FROM u_data;

Full example is available here (at the bottom): link