How to run Flask with Gunicorn in multithreaded mode
You can start your app with multiple workers or async workers with Gunicorn.
Flask server.py
from flask import Flaskapp = Flask(__name__)@app.route("/")def hello(): return "Hello World!"if __name__ == "__main__": app.run()
Gunicorn with gevent async worker
gunicorn server:app -k gevent --worker-connections 1000
Gunicorn 1 worker 12 threads:
gunicorn server:app -w 1 --threads 12
Gunicorn with 4 workers (multiprocessing):
gunicorn server:app -w 4
More information on Flask concurrency in this post: How many concurrent requests does a single Flask process receive?.
The best thing to do is to use pre-fork mode (preload_app=True). This will initialize your code in a "master" process and then simply fork off worker processes to handle requests. If you are running on linux and assuming your model is read-only, the OS is smart enough to reuse the physical memory amongst all the processes.