Write dataframe to blob using azure databricks Write dataframe to blob using azure databricks azure azure

Write dataframe to blob using azure databricks


Below is the code snippet for writing (dataframe) CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.

# Configure blob storage account access key globallyspark.conf.set(  "fs.azure.account.key.%s.blob.core.windows.net" % storage_name,  sas_key)output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)output_blob_folder = "%s/wrangled_data_folder" % output_container_path# write the dataframe as a single file to blob storage(dataframe .coalesce(1) .write .mode("overwrite") .option("header", "true") .format("com.databricks.spark.csv") .save(output_blob_folder))# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')files = dbutils.fs.ls(output_blob_folder)output_file = [x for x in files if x.name.startswith("part-")]# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container# While simultaneously changing the file namedbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

Example: notebook

enter image description here

Output: Dataframe written to blob storage using Azure Databricks

enter image description here


This answer also helps to delete the wrangled data folder leaving you with only the file you need.

storage_name = "YOUR_STORAGE_NAME"storage_access_key = "YOUR_STORAGE_ACCESS_KEY"output_container_name = "YOUR_CONTAINER_NAME"    # Configure blob storage account access key globallyspark.conf.set("fs.azure.account.key.%s.blob.core.windows.net" % storage_name, storage_access_key)output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)output_blob_folder = "%s/wrangled_data_folder" % output_container_path    # write the dataframe as a single file to blob storage(dataframe .coalesce(1) .write .mode("overwrite") .option("header", "true") .format("com.databricks.spark.csv") .save(output_blob_folder))    # Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')files = dbutils.fs.ls(output_blob_folder)output_file = [x for x in files if x.name.startswith("part-")]    # Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container    # While simultaneously changing the file namedbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)    # Delete all folders and files with 'wrangled_data' and leave only the folder neededdbutils.fs.rm("%s/wrangled_data_folder" % output_container_path, True)