How to configure beam python sdk with spark in a kubernetes environment How to configure beam python sdk with spark in a kubernetes environment kubernetes kubernetes

How to configure beam python sdk with spark in a kubernetes environment


  1. Using "External" - this definitely seems like a bug in Beam. The worker endpoints are supposed to be set up to use localhost; I don't think it is possible to configure them. I'm not sure why they would be missing; one educated guess is that the servers silently fail to start, leaving the endpoints empty. I filed a bug report (BEAM-11957) for this issue.
  2. Using "Process" - The scheme classpath corresponds to ClassLoaderFileSystem. This file system is usually loaded using AutoService, which depends on ClassLoaderFileSystemRegistrar being present on the classpath (no relation to the name of the file system itself). The classpath of the job jar is based on spark_job_server_jar. Where are you getting your beam-runners-spark-job-server-2.28.0.jar from?