Reading data from Azure Blob with Spark

java azure apache-spark azure-blob-storage spark-streaming

In order to read data from blob storage, there are two things that need to be done. First, you need to tell Spark which native file system to use in the underlying Hadoop configuration. This means that you also need the Hadoop-Azure JAR to be available on your classpath (note there maybe runtime requirements for more JARs related to the Hadoop family):

JavaSparkContext ct = new JavaSparkContext();Configuration config = ct.hadoopConfiguration();config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");

Now, call onto the file using the wasb:// prefix (note the [s] is for optional secure connection):

ssc.textFileStream("wasb[s]://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>");

This goes without saying that you'll need to have proper permissions set from the location making the query to blob storage.

java azure apache-spark azure-blob-storage spark-streaming

As supplementary, there is a tutorial about HDFS-compatible Azure Blob storage with Hadoop which is very helpful, please see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage.

Meanwhile, there is an offical sample on GitHub for Spark streaming on Azure. Unfortunately, the sample is written for Scala, but I think it's still helpful for you.

java azure apache-spark azure-blob-storage spark-streaming

df = spark.read.format(“csv”).load(“wasbs://blob_container@account_name.blob.core.windows.net/example.csv”, inferSchema = True)

CodeHunter

Reading data from Azure Blob with Spark

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last