Apache Flink deployment in Kubernetes - availability & scalability
Though k8s can restart the failed Flink pod, how can Flink restore its state to recover?
From Flink Documentation we have:
Checkpoints allow Flink to recover state and positions in the streams to give the application the same semantics as a failure-free execution.
It means that you need to have a Check Storage mounted in your pods to be able to recover the state.
In Kubernetes you could use Persistent Volumes to share the data across your pods.
Actually there are a lot of supported plugins, see here.
You can have more replicas of TaskManager
, but in Kubernetes you don't need to take care of HA for JobManager
since you can use Kubernetes self-healing deployment.
To use self-healing deployment in Kubernetes you just need to create a deployment and set the replica
to 1
, like this:
apiVersion: apps/v1kind: Deploymentmetadata: name: nginxspec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - name: http containerPort: 80 imagePullPolicy: IfNotPresent
Finally, you can check this links to help you setup Flink in Kubernetes:
running-apache-flink-on-kubernetes