How to read ORC file in hadoop streaming? How to read ORC file in hadoop streaming? hadoop hadoop

How to read ORC file in hadoop streaming?


Here is one of the example in which I am using ORC partitioned Hive table as input:

    hadoop jar /usr/hdp/2.2.4.12-1/hadoop-mapreduce/hadoop-streaming-2.6.0.2.2.4.12-1.jar \-libjars /usr/hdp/current/hive-client/lib/hive-exec.jar \-Dmapreduce.task.timeout=0 -Dmapred.reduce.tasks=1 \-Dmapreduce.job.queuename=default \ -file RStreamMapper.R RStreamReducer2.R \-mapper "Rscript RStreamMapper.R" -reducer "Rscript RStreamReducer2.R" \-input /hive/warehouse/asv.db/rtd_430304_fnl2 \-output /user/Abhi/MRExample/Output \-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat -outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat

Here /apps/hive/warehouse/asv.db/rtd_430304_fnl2 is the path of the HIVE table background ORC data storage place. Rest I need to provide appropriate jars for streaming as well as HIVE.