How can I access S3/S3n from a local Hadoop 2.6 installation? How can I access S3/S3n from a local Hadoop 2.6 installation? hadoop hadoop

How can I access S3/S3n from a local Hadoop 2.6 installation?


For some reason, the jar hadoop-aws-[version].jar which contains the implementation to NativeS3FileSystem is not present in the classpath of hadoop by default in the version 2.6 & 2.7. So, try and add it to the classpath by adding the following line in hadoop-env.sh which is located in $HADOOP_HOME/etc/hadoop/hadoop-env.sh:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

Assuming you are using Apache Hadoop 2.6 or 2.7

By the way, you could check the classpath of Hadoop using:

bin/hadoop classpath


import osos.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'import pysparksc = pyspark.SparkContext("local[*]")from pyspark.sql import SQLContextsqlContext = SQLContext(sc)hadoopConf = sc._jsc.hadoopConfiguration()myAccessKey = input() mySecretKey = input()hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")hadoopConf.set("fs.s3.awsAccessKeyId", myAccessKey)hadoopConf.set("fs.s3.awsSecretAccessKey", mySecretKey)df = sqlContext.read.parquet("s3://myBucket/myKey")


@Ashrith's answer worked for me with one modification: I had to use $HADOOP_PREFIX rather than $HADOOP_HOME when running v2.6 on Ubuntu. Perhaps this is because it sounds like $HADOOP_HOME is being deprecated?

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${HADOOP_PREFIX}/share/hadoop/tools/lib/*

Having said that, neither worked for me on my Mac with v2.6 installed via Homebrew. In that case, I'm using this extremely cludgy export:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$(brew --prefix hadoop)/libexec/share/hadoop/tools/lib/*