Flink HA JobManager cluster cannot elect a leader
According to the logs it looks as if the TaskManager
cannot connect to the new leader. I assume that this is the same for the web ui. The logs say that it tries to connect to flink-job-manager-0.flink-job-svc.flink.svc.cluster.local/10.244.3.166:44013
. I cannot say from the logs whether flink-job-manager-1
binds to this IP. But my suspicion is that the headless service might return multiple IPs and Flink picks the wrong/old one. Could you log into the flink-job-manager-1
pod and check what its IP address is?
I think you should be able to resolve this problem by defining for each JobManager
a dedicated service or if you use the pod hostname instead.