K8S Ingress: How to limit requests in flight per pod K8S Ingress: How to limit requests in flight per pod kubernetes kubernetes

K8S Ingress: How to limit requests in flight per pod


Nginx ingress allow to have rate limiting with annotations. You may want to have a look at the limit-rps one:

  • nginx.ingress.kubernetes.io/limit-rps: number of requests accepted from a given IP each second. The burst limit is set to this limit multiplied by the burst multiplier, the default multiplier is 5. When clients exceed this limit, limit-req-status-code default: 503 is returned.

On top of that NGINX will queue your requests with the leaky bucket algorithm so the incoming requests will buffered in the queue with FIFO (first-in-first-out) algorithm and then consumed at limited rate. The burst value in this case defines the size of the queue which allows the request to exceed the beyond limit. When this queue become full the next requests will be rejected.

For more detailed reading about limit traffic and shaping:


Autoscaling in Network request requires the custom metrics. Given that you are using the NGINX ingress controller, you can first install prometheus and prometheus adaptor to export the metrics from NGINX ingress controller. By default, NGINX ingress controller has already exposed the prometheus endpoint.

The relation graph will be like this.

NGINX ingress <- Prometheus <- Prometheus Adaptor <- custom metrics api service <- HPA controller

The arrow means the calling in API. So, in total, you will have three more extract components in your cluster.

Once you have set up the custom metric server, you can scale your app based on the metrics from NGINX ingress. The HPA will look like this.

apiVersion: autoscaling/v2beta1kind: HorizontalPodAutoscalermetadata:  name: srv-deployment-custom-hpaspec:  scaleTargetRef:    apiVersion: extensions/v1beta1    kind: Deployment    name: srv-deployment  minReplicas: 1  maxReplicas: 100  metrics:  - type: Pods    pods:      metricName: nginx_srv_server_requests_per_second      targetAverageValue: 100

I won't go through the actual implementation here because it will include a lot of environment specific configuration.

Once you have set that up, you can see the HPA object will show up the metrics which is pulling from the adaptor.

For the rate limiting in the Service object level, you will need a powerful service mesh to do so. Linkerd2 is designed to be lightweight so it does not ship with the function in rate limiting. You can refer to this issue under linkerd2. The maintainer rejected to implement the rate limiting in the service level. They would suggest you to do this on Ingress level instead.

AFAIK, Istio and some advanced serivce mesh provides the rate limiting function. In case you haven't deployed the linkerd as your service mesh option, you may try Istio instead.

For Istio, you can refer this document to see how to do the rate limiting. But I need to let you know that Istio with NGINX ingress may cause you a trouble. Istio is shipped with its own ingress controller. You will need to have extra work for making it work.

To conclude, if you can use the HPA with custom metrics in the number of requests, it will be the quick solution to resolve your issue in traffic control. Unless you still have a really hard time with the traffic control, you will then need to consider the Service level rate limiting.