How to convert .txt / .csv file to ORC format
You can insert text data into a orc table by such command:
insert overwrite table orcTable select * from textTable;
The first table is orcTable is created by the following command:
create table orcTable(name string, city string) stored as orc;
And the textTable is as the same structure as orcTable.
You can use Spark dataframes to convert a delimited file to orc format very easily.You can also specify/impose a schema and filter specific columns as well.
public class OrcConvert { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("OrcConvert"); JavaSparkContext jsc = new JavaSparkContext(conf); HiveContext hiveContext = new HiveContext(jsc); String inputPath = args[0]; String outputPath = args[1]; DataFrame inputDf = hiveContext.read().format("com.databricks.spark.csv") .option("quote", "'").option("delimiter", "\001") .load(inputPath); inputDf.write().orc(outputPath); }}
Make sure all dependencies are met, a hive should be running to use HiveContext also, currently in Spark ORC format is only supported in HiveContext.