How to remove the parentheses around records when saveAsTextFile on RDD[(String, Int)]? [duplicate] How to remove the parentheses around records when saveAsTextFile on RDD[(String, Int)]? [duplicate] hadoop hadoop

How to remove the parentheses around records when saveAsTextFile on RDD[(String, Int)]? [duplicate]


Use map transformation before you save the records to outputfiles directory, e.g.

wordcountRDD.map { case (k, v) => s"$k, $v" }.saveAsTextFile("/user/cloudera/outputfiles")

See Spark's documentation about map.


I strongly recommend using Datasets instead.

scala> words.toSeq.toDS.groupBy("value").count().show+-----+-----+|value|count|+-----+-----+|  HOW|    1||  ARE|    1||   HI|    1|+-----+-----+scala> words.toSeq.toDS.groupBy("value").count.write.csv("outputfiles")$ cat outputfiles/part-00199-aa752576-2f65-481b-b4dd-813262abb6c2-c000.csvHI,1

See Spark SQL, DataFrames and Datasets Guide.


This format is a format of Tuple. You can manually define your format:

val wordcountRDD = keyvalueRDD.reduceByKey((x,y) => x+y)                              // here we set custom format                              .map(x => x._1 + "," + x._2)wordcountRDD.saveAsTextFile("/user/cloudera/outputfiles")