Dependency issue with Pyspark running on Kubernetes using spark-on-k8s-operator Dependency issue with Pyspark running on Kubernetes using spark-on-k8s-operator kubernetes kubernetes

Dependency issue with Pyspark running on Kubernetes using spark-on-k8s-operator


Use the Google Cloud Storage path to the python dependencies since they're uploaded there.

spec:  deps:    pyFiles:      - gs://gcs-bucket-name/deps.zip


If the zip files contain jar which you always shall require while running your spark job, facing a similar issue I just added

FROM gcr.io/spark-operator/spark-py:v2.4.5COPY mydepjars/ /opt/spark/jars/

And everything is getting loaded within my spark session. Could be one way to do it.