Write dataframe to blob using azure databricks

azure azure-storage azure-blob-storage azure-databricks

Below is the code snippet for writing (dataframe) CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.

# Configure blob storage account access key globallyspark.conf.set(  "fs.azure.account.key.%s.blob.core.windows.net" % storage_name,  sas_key)output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)output_blob_folder = "%s/wrangled_data_folder" % output_container_path# write the dataframe as a single file to blob storage(dataframe .coalesce(1) .write .mode("overwrite") .option("header", "true") .format("com.databricks.spark.csv") .save(output_blob_folder))# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')files = dbutils.fs.ls(output_blob_folder)output_file = [x for x in files if x.name.startswith("part-")]# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container# While simultaneously changing the file namedbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

Example: notebook

Output: Dataframe written to blob storage using Azure Databricks

azure azure-storage azure-blob-storage azure-databricks

This answer also helps to delete the wrangled data folder leaving you with only the file you need.

storage_name = "YOUR_STORAGE_NAME"storage_access_key = "YOUR_STORAGE_ACCESS_KEY"output_container_name = "YOUR_CONTAINER_NAME"    # Configure blob storage account access key globallyspark.conf.set("fs.azure.account.key.%s.blob.core.windows.net" % storage_name, storage_access_key)output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)output_blob_folder = "%s/wrangled_data_folder" % output_container_path    # write the dataframe as a single file to blob storage(dataframe .coalesce(1) .write .mode("overwrite") .option("header", "true") .format("com.databricks.spark.csv") .save(output_blob_folder))    # Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')files = dbutils.fs.ls(output_blob_folder)output_file = [x for x in files if x.name.startswith("part-")]    # Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container    # While simultaneously changing the file namedbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)    # Delete all folders and files with 'wrangled_data' and leave only the folder neededdbutils.fs.rm("%s/wrangled_data_folder" % output_container_path, True)

CodeHunter

Write dataframe to blob using azure databricks

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last