spark-shell error : No FileSystem for scheme: wasb
Another way of setting Azure Storage (wasb and wasbs files) in spark-shell is:
- Copy azure-storage and hadoop-azure jars in the ./jars directory of spark installation.
Run the spark-shell with the parameters —jars [a comma separated list with routes to those jars] Example:
$ bin/spark-shell --master "local[*]" --jars jars/hadoop-azure-2.7.0.jar,jars/azure-storage-2.0.0.jar
Add the following lines to the Spark Context:
sc.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")sc.hadoopConfiguration.set("fs.azure.account.key.my_account.blob.core.windows.net", "my_key")
Run a simple query:
sc.textFile("wasb://my_container@my_account_host/myfile.txt").count()
- Enjoy :)
With this settings you could easily could setup a Spark application, passing the parameters to the 'hadoopConfiguration' on the current Spark Context
Hai Ning from Microsoft has written an excellent blog post on to setup wasb on an apache hadoop installation.
Here is the summary:
Add
hadoop-azure-*.jar
andazure-storage-*.jar
to hadoop classpath1.1 Find the jars in your local installation. It's at /usr/hdp/current/hadoop-client folder on HDInsight cluster.
1.2 Update
HADOOP_CLASSPATH
variable athadoop-env.sh
. Use exact jar name as java classpath doesn't support partial wildcard.Update core-site.xml
<property> <name>fs.AbstractFileSystem.wasb.Impl</name> <value>org.apache.hadoop.fs.azure.Wasb</value> </property><property> <name>fs.azure.account.key.my_blob_account_name.blob.core.windows.net</name> <value>my_blob_account_key</value> </property><!-- optionally set the default file system to a container --> <property> <name>fs.defaultFS</name> <value>wasb://my_container_name@my_blob_account_name.blob.core.windows.net</value></property>
See exact steps here: https://github.com/hning86/articles/blob/master/hadoopAndWasb.md