Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

java hadoop parquet apache-drill data-formats

You can write parquet format out side hadoop cluster using java Parquet Client API.

Here is a sample code in java which writes parquet format to local disk.

import org.apache.avro.Schema;import org.apache.avro.generic.GenericData;import org.apache.avro.generic.GenericRecord;import org.apache.hadoop.fs.Path;import org.apache.parquet.avro.AvroSchemaConverter;import org.apache.parquet.avro.AvroWriteSupport;import org.apache.parquet.hadoop.ParquetWriter;import org.apache.parquet.hadoop.metadata.CompressionCodecName;import org.apache.parquet.schema.MessageType;public class Test {    void test() throws IOException {        final String schemaLocation = "/tmp/avro_format.json";        final Schema avroSchema = new Schema.Parser().parse(new File(schemaLocation));        final MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);        final WriteSupport<Pojo> writeSupport = new AvroWriteSupport(parquetSchema, avroSchema);        final String parquetFile = "/tmp/parquet/data.parquet";        final Path path = new Path(parquetFile);        ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);        final GenericRecord record = new GenericData.Record(avroSchema);        record.put("id", 1);        record.put("age", 10);        record.put("name", "ABC");        record.put("place", "BCD");        parquetWriter.write(record);        parquetWriter.close();    }}

avro_format.json,

{   "type":"record",   "name":"Pojo",   "namespace":"com.xx.test",   "fields":[      {         "name":"id",         "type":[            "int",            "null"         ]      },      {         "name":"age",         "type":[            "int",            "null"         ]      },      {         "name":"name",         "type":[            "string",            "null"         ]      },      {         "name":"place",         "type":[            "string",            "null"         ]      }   ]}

Hope this helps.

CodeHunter

Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last