handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

apache-spark amazon-s3 kubernetes

It works as it should with s3a:// urls. Unfortunatly getting s3a running on the stock spark-hadoop2.7.3 is problematic (authentication mainly), so I opted for building spark with Hadoop 2.9.1, since S3A has seen significant development there

I have created a gist with the steps needed to

build spark with new hadoop dependencies
build the docker image for k8s
push image to ECR

The script also creates a second docker image with the S3A dependencies added and base conf settings for enabling S3A using IAM credentials so running in AWS doesn't require putting access/secretkey in conf files/args

I havn't run any production spark jobs yet using the image, but have tested that basic saving and loading to s3a urls does work.

I have yet to experiment with S3Guard which uses DynamoDB to ensure that S3 writes/reads are consistent - similarly to EMRFS

apache-spark amazon-s3 kubernetes

The Init container is created automatically for you by Spark.

For example, you can use

kubectl describe pod [name of your driver svc] and you'll see the Init container named spark-init.

You can also acccess the logs from the init-container via a command like:

kubectl logs [name of your driver svc] -c spark-init

Caveat: I'm not running in AWS, but a custom K8S. My init-container successfully runs a downloads dependencies from an HTTP server (but not S3, strangely).

CodeHunter

handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last