Spark output filename and append on write Spark output filename and append on write azure azure

Spark output filename and append on write


1) There is no direct support in saveAsTextFile method to control file output name.You can try using saveAsHadoopDataset to control output file basename.

e.g.: instead of part-00000 you can get yourCustomName-00000.

Keep in mind that you cannot control the suffix 00000 using this method. It is something spark automatically assigns for each partition while writing so that each partition writes to a unique file.

In order to control that too as mentioned above in the comments you have to write your own custom OutputFormat.

SparkConf conf=new SparkConf();conf.setMaster("local").setAppName("yello");JavaSparkContext sc=new JavaSparkContext(conf);JobConf jobConf=new JobConf();jobConf.set("mapreduce.output.basename", "customName");jobConf.set("mapred.output.dir", "outputPath");JavaRDD<String> input = sc.textFile("inputDir");input.saveAsHadoopDataset(jobConf);

2) A workaround would be to write output as it is to your output location and use Hadoop FileUtil.copyMerge function to form merged file.