Apache/mod_wsgi process dies unexpectedly
If you are using embedded mode of mod_wsgi that can happen as Apache controls the life time of processes and can recycle them if it thinks a process is no longer required due to insufficient traffic.
You might be thinking 'but I am using daemon mode and not embedded mode', but reality is that you aren't as your configuration is wrong. You have:
<VirtualHost *:5010> ServerName localhost WSGIDaemonProcess entry user=kesiena group=staff threads=5 WSGIScriptAlias "/" "/Users/kesiena/Dropbox (MIT)/Sites/onetext/onetext.local.wsgi" <directory "/Users/kesiena/Dropbox (MIT)/Sites/onetext/app"> WSGIProcessGroup start WSGIApplicationGroup %{GLOBAL} WSGIScriptReloading On Order deny,allow Allow from all </directory></virtualhost>
That Directory
block doesn't use a directory which matches the path in WSGIScriptAlias
, so none of it applies.
Use:
<VirtualHost *:5010> ServerName localhost WSGIDaemonProcess entry user=kesiena group=staff threads=5 WSGIScriptAlias "/" "/Users/kesiena/Dropbox (MIT)/Sites/onetext/onetext.local.wsgi" <directory "/Users/kesiena/Dropbox (MIT)/Sites/onetext"> WSGIProcessGroup start WSGIApplicationGroup %{GLOBAL} Order deny,allow Allow from all </directory></virtualhost>
The only reason it worked at all without that matching is that you had opened up access to Apache to host files in that directory by having:
<Directory "/Users/kesiena/Dropbox (MIT)/Sites"> Require all granted</Directory>
It is bad practice to also set DocumentRoot
to be a parent directory of where your application source code exists. With the way it is written there is a risk I could come in on a different port or VirtualHost
and download all your application code.
Do not stick your application code under the directory listed against DocumentRoot
.
BTW, even when you have the WSGI application running in daemon mode, Apache can still recycle the worker processes it will use to proxy requests to mod_wsgi. So even if your very long running request keeps running in the WSGI application process, it could fail as soon as it starts to send a response if the worker process had been recycled in the interim because it had been running too long.
You should definitely farm out the long running operation to a back end Celery task queue or similar.
You might be hitting forced socket closures, though with the times you gave that does not look too likely. For a project I had on Azure, any connection that was idle for about 3 minutes would get closed by the system. I believe these closures were done ahead of the server in the network routing, so there was no way to disable them or increase the timeout.
Hm tricky problem.
Guess 1: I had a similar problem once. Have you played a bit around with your KeepAlive time? Set it to 60 minutes or more and test to see if the problem persist. More details here https://httpd.apache.org/docs/2.4/de/mod/core.html
Guess 2: Could amazon "move" your machine in the background, which interrupts your database connection or flask cannot handle the "unloading" and "loading" of the VM?