I am not entirely sure if tweaking spring boot version / embedded web server will fix this, but below is how you can scale this up using Kubernetes / Istio .

  • livenessProbe

If livenessProbe is configured correctly then Kubernetes restarts pods if they aren't alive. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-liveness-http-request

  • Horizontal Pod Autoscaller

Increases/Decreases the number of replicas of the pods based on CPU utilization or custom metrics. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

  • Vertical Pod Autoscaller

Increase/Decrease the CPU / RAM of the POD based on the load. https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler

  • Cluster Autoscaller

Increase/Decrease the number of nodes in the cluster based on load. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

  • Istio Rate limiting & Retry mechanism

Limit the number of requests that the service will receive & have a retry mechanism for the requests which couldn't get executedhttps://istio.io/docs/tasks/traffic-management/request-timeouts/https://istio.io/docs/concepts/traffic-management/#network-resilience-and-testing