Grpc server scaling (bidirectional infinite streaming) Grpc server scaling (bidirectional infinite streaming) kubernetes kubernetes

Grpc server scaling (bidirectional infinite streaming)


Right now we don't have a good answer: the thread-per-RPC assumption was baked into gRPC Python fairly early and deeply, well before we were aware of "just keep an open connection in case either side has anything to say" long-lived RPCs being a use case.

We're working on better solutions but they'll likely be a while in coming.

Increasing the number of worker threads definitely sounds like the right answer for the time being. I'd be very curious to hear how it works out since your threads will be mostly idle most of the time (right?).

An option to maybe try that might work out well would be to design an object that implements the interface of futures.ThreadPoolExecutor but that actually does some sophisticated internal multiplexing to service a great many more RPCs. It's an idea that I've had on my mind for a while but haven't gotten around to testing out myself.