How to remove the parentheses around records when saveAsTextFile on RDD[(String, Int)]? [duplicate]
Use map
transformation before you save the records to outputfiles
directory, e.g.
wordcountRDD.map { case (k, v) => s"$k, $v" }.saveAsTextFile("/user/cloudera/outputfiles")
See Spark's documentation about map.
I strongly recommend using Datasets instead.
scala> words.toSeq.toDS.groupBy("value").count().show+-----+-----+|value|count|+-----+-----+| HOW| 1|| ARE| 1|| HI| 1|+-----+-----+scala> words.toSeq.toDS.groupBy("value").count.write.csv("outputfiles")$ cat outputfiles/part-00199-aa752576-2f65-481b-b4dd-813262abb6c2-c000.csvHI,1
This format is a format of Tuple. You can manually define your format:
val wordcountRDD = keyvalueRDD.reduceByKey((x,y) => x+y) // here we set custom format .map(x => x._1 + "," + x._2)wordcountRDD.saveAsTextFile("/user/cloudera/outputfiles")