How do I run Beam Python pipelines using Flink deployed on Kubernetes?
I found the solution. The new version of Apache Beam 2.16.0 provides an implementation to use in combination with environment type EXTERNAL. The implementation is based on worker_pool_main which has been created to support Kubernetes.
To answer the question above, basically you want to add beam_worker_pool container along side with the flink task manager container in the same pods. So in the yaml file that you use to deploy flink task managers, add a new container:
- name: beam-worker-pool image: apache/beam_python3.7_sdk:2.22.0 args: ["--worker_pool"] ports: - containerPort: 50000 name: pool livenessProbe: tcpSocket: port: 50000 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ securityContext: runAsUser: 9999
I know it is a bit outdated but there is a Flink operator for Kubernetes now.
Here are examples how to run Apache Beam with Flink using an operator:
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/tree/master/examples/beam