Apache Flink - duplicate message processing during job deployments, with ActiveMQ as source Apache Flink - duplicate message processing during job deployments, with ActiveMQ as source kubernetes kubernetes

Apache Flink - duplicate message processing during job deployments, with ActiveMQ as source


JDBC currently only supports at least once, meaning you get duplicate messages upon recovery. There is currently a draft to add support for exactly once, which would probably be released with 1.11.

Shouldn't the checkpoints help the job recover from where it left?

Yes, but the time between last successful checkpoints and recovery could produce the observed duplicates. I gave a more detailed answer on a somewhat related topic.

Should I take the checkpoint before I (rolling) deploy new job?

Absolutely. You should actually use cancel with savepoint. That is the only reliable way to change the topology. Additionally, cancel with savepoints avoids any duplicates in the data as it gracefully shuts down the job.

What happens if the job quit with error or cluster failure?

It should automatically restart (depending on your restart settings). It would use the latest checkpoint for recovery. That would most certainly result in duplicates.

As the jobid keeps changing on every deployment, how does the recovery happens?

You usually point explicitly to the same checkpoint directory (on S3?).

As I cannot expect idempotency from the database, is upsert the only way to achieve Exactly-Once processing?

Currently, I do not see a way around it. It should change with 1.11.