How to export data from Spark SQL to CSV How to export data from Spark SQL to CSV hadoop hadoop

How to export data from Spark SQL to CSV


You can use below statement to write the contents of dataframe in CSV formatdf.write.csv("/data/home/csv")

If you need to write the whole dataframe into a single CSV file, then usedf.coalesce(1).write.csv("/data/home/sample.csv")

For spark 1.x, you can use spark-csv to write the results into CSV files

Below scala snippet would help

import org.apache.spark.sql.hive.HiveContext// sc - existing spark contextval sqlContext = new HiveContext(sc)val df = sqlContext.sql("SELECT * FROM testtable")df.write.format("com.databricks.spark.csv").save("/data/home/csv")

To write the contents into a single file

import org.apache.spark.sql.hive.HiveContext// sc - existing spark contextval sqlContext = new HiveContext(sc)val df = sqlContext.sql("SELECT * FROM testtable")df.coalesce(1).write.format("com.databricks.spark.csv").save("/data/home/sample.csv")


Since Spark 2.X spark-csv is integrated as native datasource. Therefore, the necessary statement simplifies to (windows)

df.write  .option("header", "true")  .csv("file:///C:/out.csv")

or UNIX

df.write  .option("header", "true")  .csv("/var/out.csv")

Notice: as the comments say, it is creating the directory by that name with the partitions in it, not a standard CSV file. This, however, is most likely what you want since otherwise your either crashing your driver (out of RAM) or you could be working with a non distributed environment.


The answer above with spark-csv is correct but there is an issue - the library creates several files based on the data frame partitioning. And this is not what we usually need. So, you can combine all partitions to one:

df.coalesce(1).    write.    format("com.databricks.spark.csv").    option("header", "true").    save("myfile.csv")

and rename the output of the lib (name "part-00000") to a desire filename.

This blog post provides more details: https://fullstackml.com/2015/12/21/how-to-export-data-frame-from-apache-spark/