continuous deployment for stateful apache flink application on kubernetes continuous deployment for stateful apache flink application on kubernetes kubernetes kubernetes

continuous deployment for stateful apache flink application on kubernetes


  • You might be happier with Ververica Platform: Community Edition, which raises the level of abstraction to the point where you don't have to deal with the details at this level. It has an API that was designed with CI/CD in mind.
  • I'm not sure I understand your second point, but it's normal that your job will rewind and reprocess some data during recovery. Flink does not guarantee exactly once processing, but rather exactly once semantics: each event will affect the state being managed by Flink exactly once. This is done by rolling back to the offsets in the most recent checkpoint, and rolling back all of the other state to what it had been after consuming all of the data up to those offsets.
  • Having a state backend is necessary as a place to store your job's working state while the job is running. If you don't enable checkpointing, then the working state won't be checkpointed, and can not be recovered. However, as of Flink 1.11, you can enable checkpointing via the config file, using
execution.checkpointing.interval: 60000execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION


there are several ways to deploy workloads to kubernetes, simple YAML files, Helm Chart, and Operator.

Upgrading a stateful Flink job is not as simple as upgrading a stateless service, you only need to update the binary file and restart.

Upgrading Flink Job you need to take a savepoint or get the latest checkpoint dir and then update binary and finally resubmit your job, in this case, I think simple YAML files and Helm Chart cannot help you to achieve this, you should consider implementing a Flink Operator to do the upgrading job.

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator