ORC fileformat with Impala ORC fileformat with Impala hadoop hadoop

ORC fileformat with Impala


ORC is not supported in Impala. Rather, Apache Parquet is the recommend format for best performance.


Impala cannot read ORC file format. If you have the possibility, I would suggest to migrate your ORC files to PARQUET with Hive. The advantage is that you are paying just one the time of setting up map-reduce tasks.

If your ORC table is nameoforctable, the a very basic query looks like:

CREATE TABLE nameoforctable_parquetLIKE nameoforctableSTORED AS PARQUETLOCATION '/your/hdfs/location';INSERT INTO nameoforctable_parquet SELECT * FROM nameoforctable


Even though ORC is the only format to support ACID feature in Hive and demonstrated better query performance and compression ratio in some benchmarking studies, Impala doesn't support the ORC file format because it was created by Hortonworks, who is one of their major competitors. Vice versa, the Hive version on Hortonworks Data Platform (HDP) does not support Parquet for the same reason.