Apache Flink - duplicate message processing during job deployments, with ActiveMQ as source

kubernetes apache-flink flink-streaming flink-cep flink-sql

JDBC currently only supports at least once, meaning you get duplicate messages upon recovery. There is currently a draft to add support for exactly once, which would probably be released with 1.11.

Shouldn't the checkpoints help the job recover from where it left?

Yes, but the time between last successful checkpoints and recovery could produce the observed duplicates. I gave a more detailed answer on a somewhat related topic.

Should I take the checkpoint before I (rolling) deploy new job?

Absolutely. You should actually use cancel with savepoint. That is the only reliable way to change the topology. Additionally, cancel with savepoints avoids any duplicates in the data as it gracefully shuts down the job.

What happens if the job quit with error or cluster failure?

It should automatically restart (depending on your restart settings). It would use the latest checkpoint for recovery. That would most certainly result in duplicates.

As the jobid keeps changing on every deployment, how does the recovery happens?

You usually point explicitly to the same checkpoint directory (on S3?).

As I cannot expect idempotency from the database, is upsert the only way to achieve Exactly-Once processing?

Currently, I do not see a way around it. It should change with 1.11.

CodeHunter

Apache Flink - duplicate message processing during job deployments, with ActiveMQ as source

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last