How do I run Beam Python pipelines using Flink deployed on Kubernetes? How do I run Beam Python pipelines using Flink deployed on Kubernetes? kubernetes kubernetes

How do I run Beam Python pipelines using Flink deployed on Kubernetes?


I found the solution. The new version of Apache Beam 2.16.0 provides an implementation to use in combination with environment type EXTERNAL. The implementation is based on worker_pool_main which has been created to support Kubernetes.


To answer the question above, basically you want to add beam_worker_pool container along side with the flink task manager container in the same pods. So in the yaml file that you use to deploy flink task managers, add a new container:

  - name: beam-worker-pool    image: apache/beam_python3.7_sdk:2.22.0    args: ["--worker_pool"]    ports:    - containerPort: 50000      name: pool    livenessProbe:      tcpSocket:        port: 50000      initialDelaySeconds: 30      periodSeconds: 60    volumeMounts:    - name: flink-config-volume      mountPath: /opt/flink/conf/    securityContext:      runAsUser: 9999


I know it is a bit outdated but there is a Flink operator for Kubernetes now.

Here are examples how to run Apache Beam with Flink using an operator:

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/tree/master/examples/beam