Deploying a (single node) Django Web application with virtually zero downtime on EC2 Deploying a (single node) Django Web application with virtually zero downtime on EC2 postgresql postgresql

Deploying a (single node) Django Web application with virtually zero downtime on EC2


What might be interested to look at is a technique called Canary Releasing. I saw a great presentation of Jez Humble last year at a software conference in Amsterdam; it was about low risk releases, the slides are here.

The idea is to not switch all systems at once, but to send a small set of users to the new version. Only when all performance metrics of the new systems are like expected, the others are switched over as well. I know that this technique is also used by big sites like facebook.


The live server should not get migrated. That server should be accessible from two staging servers, server0 and server1. Initially, server0 is live, and changes are made to server1. When you want to change software, switch live servers. As to new content, that should not be on the staging server. That should be on the live server. Add a column to your tables with a version number for the content tables, and modify your code base to use the correct version number of content. Develop software to copy old versions to new rows with updated version numbers as needed. Put the current version number in your settings.py on server0 and server1, so you have a central place for software to refer to when selecting data, or create a database access app that can be updated to get correct versions of content. Of course, for template files those can be on each server and will be appropriate.

This approach will eliminate any downtime. You will have to rewrite some of your software, but if you find a common access method, such as a database access method that you can modify, you might find it is not that much work. The up front investment in creating a system that specifically supports instant switching of systems will be much less work in the long term, and will be scalable to any content size.


If I understand correctly, the problem seems to be that your application is down, while the data are being restored to a new database along with the schema.

Why do you create a new server in the first place? Why not migrate the database in-place (of course, after you have extensively tested the migrations) and, once this is done, update the code and "restart" your processes (gunicorn, for instance, can accept the HUP signal that will make it reload the application without dropping connections in the queue).

Many migrations will not have to lock the database tables at all, so this is safe. For the rest, there are other ways to do it. For instance, if you want to add a new column that has to be populated with correct data first, you can do that in the following steps (briefly described):

  1. Add the column as accepting NULL values and make django start writing to that column, so that new entries will have the correct data.
  2. Populate the exsiting entries.
  3. Make django start reading from the new column, too.