Tensorflow Serving: When to use it rather than simple inference inside Flask service? Tensorflow Serving: When to use it rather than simple inference inside Flask service? flask flask

Tensorflow Serving: When to use it rather than simple inference inside Flask service?


I believe most of the reasons why you would prefer Tensorflow Serving over Flask are related to performance:

  • Tensorflow Serving makes use of gRPC and Protobuf while a regularFlask web service uses REST and JSON. JSON relies on HTTP 1.1 whilegRPC uses HTTP/2 (there are important differences). In addition,Protobuf is a binary format used to serialize data and it is moreefficient than JSON.
  • TensorFlow Serving can batch requests to the same model, which uses hardware (e.g. GPUs) more appropriate.
  • TensorFlow Serving can manage model versioning

As almost everything, it depends a lot on the use case you have and your scenario, so it's important to think about pros and cons and your requirements. TensorFlow Serving has great features, but these features could be also implemented to work with Flask with some effort (for instance, you could create your batch mechanism).


Flask is used to handle request/response whereas Tensorflow serving is particularly built for serving flexible ML models in production.

Let's take some scenarios where you want to:

  • Serve multiple models to multiple products (Many to Many relations) atthe same time.
  • Look which model is making an impact on your product (A/B Testing).
  • Update model weights in production, which is as easy as saving a newmodel to a folder.
  • Have a performance equal to code written in C/C++.

And you can always use all those advantages for FREE by sending requests to TF Serving using Flask.