Timeout trying to start flink job master for checkpointed job
I'm staging jars on flink before execution using the /jars/upload
endpoint. It seems that flink's performance tanks when it has too many jars uploaded. All the endpoints become unresponsive including the /jobs/<job_id>
endpoint. It was taking 1 - 2 minutes to load the job graph overview in the flink UI. I imagine this rest endpoint uses the akka same actor the job manager does. I think I must've hit a tipping point where this started causing timeouts. I've reduced the number of jars for 30 odd to just the 4 latest versions and flink is responsive again.