Unable to read keystore file from pyspark
I think you're using the wrong option
For Python, you can use the
--py-files
argument of spark-submit to add .py, .zip or .egg files to be distributed with your application
Instead you want --files
--files FILES: Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName)
To place it at a different path within the executor, then you can use #
separator
spark-submit ... --files mycerts.jks#/<path>/mycerts.jks
Within the code, you can get a reference to the path from SparkFiles.get("mycerts.jks")
, which returns the absolute path to the file
As pointed out by cricket_007 above, you are using incorrect option --py-files. Instead use --files option to upload your cert files.
Also, these files are not uploaded in the local filesystem of these executors but on HDFS. Thus the the path you are passing for the cert file in your spark code is also incorrect as it is pointing to local filesystem.
'es.net.ssl.truststore.location':'file:///<path>/mycerts.jks'
You can use the # separator with files option in your spark submit command to upload the file on a specified on HDFS
spark-submit ... --files mycerts.jks#/<path>/mycerts.jks
and then use the same path in your spark code to access the file.
'es.net.ssl.truststore.location':'/<path>/mycerts.jks'
Based on your stack message it seems your <path
is clearly wrong. Your path value after --jars
and --py-files
should be accurate but alas it's currently not.
Expected to find keystore file at [file:////mycerts.jks] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.