How to convert .txt / .csv file to ORC format How to convert .txt / .csv file to ORC format hadoop hadoop

How to convert .txt / .csv file to ORC format


You can insert text data into a orc table by such command:

insert overwrite table orcTable select * from textTable;

The first table is orcTable is created by the following command:

create table orcTable(name string, city string) stored as orc;

And the textTable is as the same structure as orcTable.


You can use Spark dataframes to convert a delimited file to orc format very easily.You can also specify/impose a schema and filter specific columns as well.

public class OrcConvert {   public static void main(String[] args) {    SparkConf conf = new SparkConf().setAppName("OrcConvert");    JavaSparkContext jsc = new JavaSparkContext(conf);    HiveContext hiveContext = new HiveContext(jsc);    String inputPath = args[0];    String outputPath = args[1];    DataFrame inputDf = hiveContext.read().format("com.databricks.spark.csv")            .option("quote", "'").option("delimiter", "\001")            .load(inputPath);    inputDf.write().orc(outputPath);  }}

Make sure all dependencies are met, a hive should be running to use HiveContext also, currently in Spark ORC format is only supported in HiveContext.