How to read specific fields from Avro-Parquet file in Java?

So...

Couple of things:

AvroReadSupport.setRequestedProjection(hadoopConf, ClassB.$Schema) can be used to set a projection for the columns that are selected.
The reader.readNext method still will return a ClassA object but will null out the fields that are not present in ClassB.

To use the reader directly you can do the following:

AvroReadSupport.setRequestedProjection(hadoopConf, ClassB.SCHEMA$);final Builder<ClassB> builder = AvroParquetReader.builder(files[0].getPath());final ParquetReader<ClassA> reader = builder.withConf(hadoopConf).build();ClassA record = null;final List<ClassA> list = new ArrayList<>();while ((record = reader.read()) != null) {    list.add(record);}

Also if you're planning to use an inputformat to read the avro-parquet file, there is a convenience method - here is a spark example:

        final Job job = Job.getInstance(hadoopConf);        ParquetInputFormat.setInputPaths(job, pathGlob);        AvroParquetInputFormat.setRequestedProjection(job, ClassB.SCHEMA$);        @SuppressWarnings("unchecked")        final JavaPairRDD<Void, ClassA> rdd = sc.newAPIHadoopRDD(job.getConfiguration(), AvroParquetInputFormat.class,                Void.class, ClassA.class);

CodeHunter

How to read specific fields from Avro-Parquet file in Java?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last