How to load RDDs from S3 files from spark-shell?

scala apache-spark hadoop amazon-s3

org.apache.hadoop.fs.StreamCapabilities is in hadoop-common-3.1.jarYou are probably mixing version of Hadoop JARs, which, as coved in the s3a troubleshooting docs is doomed.

Spark shell works fine with the right JARs in. But ASF Spark releases don't work with Hadoop 3.x yet, due to some outstanding issues. Stick to Hadoop 2.8.x and you'll get good S3 performance without so much pain.

scala apache-spark hadoop amazon-s3

I found a path that fixed the issue, but I have no idea why.

Create an SBT IntelliJ project
Include the below dependencies and overrides

Run the script (sans require statement) from sbt console

scalaVersion := "2.11.12"libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.1.0"libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.1.0"dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.8.7"dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7"dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.7"

The key part, naturally, is overriding the jackson dependencies.

CodeHunter

How to load RDDs from S3 files from spark-shell?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last