Spark 2.0 deprecates 'DirectParquetOutputCommitter', how to live without it?

hadoop apache-spark amazon-s3 amazon-emr parquet

You can use: sparkContext.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version", "2")

since you are on EMR just use s3 (no need for s3a)

We are using Spark 2.0 and writing Parquet to S3 pretty fast (about as fast as HDFS)

if you want to read more check out this jira ticket SPARK-10063

hadoop apache-spark amazon-s3 amazon-emr parquet

I think the S3 committer from Netflix is already open sourced at: https://github.com/rdblue/s3committer.

CodeHunter