Collaborating on websites with relational databases and a CMS Collaborating on websites with relational databases and a CMS database database

Collaborating on websites with relational databases and a CMS


You put all your database changes in SQL scripts. Put some kind of sequence number into the filename of each script so you know the order they must be run in. Then check in those scripts into your source control system. Now you have reproducible steps that you can apply to test and production databases.


While you could put all your DDL into the VC, this can get very messy very quickly if you try to manage lots and lots of ALTER statements.

Forcing all developers to use the same source database is not a very efficient approach either.

The solution I used was to maintain a file for each database entity specifying how to create the entity (primarily so the changes could be viewed using a diff utility), then manually creating ALTER statements by comparing the release version with the current version - yes, it is rather labour intensive but the only way I've found to solve the problem.

I had a plan to automate the generation of the ALTER statements - it should be relatively straightforward - indeed a quick google found this article and this one. Never got round to implementing one myself since the effort of doing so was much greater than the frequency of schema changes on the projects I was working on.


Where i work, every developer (actually, every development virtual machine) has its own database (or rather, its own schema on a shared Oracle instance). Our working process is based around complete rebuilds. We don't have any ability to modify an existing database - we only ever have the nuclear option of blowing away the whole schema and building from scratch.

We have a little 'drop everything' script, which uses queries on system tables to identify every object in the schema, constructs a pile of SQL to drop them, and runs it. Then we have a stack of DDL files full of CREATE TABLE statements, then we have a stack of XML files containing the initial data for the system, which are loaded by a loading tool. All of this is checked into source control. When a developer does an update from source control, if they see incoming database changes (DDL or data), they run the master build script, which runs them in order to create a fresh database from scratch.

The good thing is that this makes life simple. We never need to worry about diffs, deltas, ALTER TABLE, reversibility, etc, just straightforward DDL and data. We never have to worry about preserving the state of the database, or keeping it clean - you can get back to a clean state at the push of a button. Another important feature of this is that it makes it trivial to set up a new platform - and that means that when we add more development machines, or need to build an acceptance system or whatever, it's easy. I've seen projects fail because they couldn't build new instances from their muddled databases.

The main bad thing is that it takes some time - in our case, due to the particular depressing details of our system, a painfully long time, but i think a team that was really on top of its tools could do a complete rebuild like this in 10 minutes. Half an hour if you have a lot of data. Short enough to be able to do a few times during a working day without killing yourself.

The problem is what you do about data. There are two sides to this: data generated during development, and live data.

Data generated during development is actually pretty easy. People who don't work our way are presumably in the habit of creating that data directly in the database, and so see a problem in that it will be lost when rebuilding. The solution is simple: you don't create the data in the database, you create it in the loader scripts (XML in our case, but you could use SQL DML, or CSV with your database's import tool, or whatever). Think of the loader scripts as being source code, and the database as object code: the scripts are the definitive form, and are what you edit by hand; the database is what's made from them.

Live data is tougher. My company hasn't developed a single process which works in all cases - i don't know if we just haven't found the magic bullet yet, or if there isn't one. One of our projects is taking the approach that live is different to development, and that there are no complete rebuilds; rather, they have developed a set of practices for identifying the deltas when making a new release and applying them manually. They release every few weeks, so it's only a couple of days' work for a couple of people that often. Not a lot.

The project i'm on hasn't gone live yet, but it is replacing an existing live system, so we have a similar problem. Our approach is based on migration: rather than trying to use the existing database, we are migrating all the data from it into our system. We have written a rather sprawling tool to do this, which runs queries against the existing database (a copy of it, not the live version!), then writes the data out as loader scripts. These then feed into the build process just like any others. The migration is scripted, and runs every night as part of our daily build. In this case, the effort needed to write this tool was necessary anyway, because our database is very different in structure to the old one; the ability to do repeatable migrations at the push of a button came for free.

When we go live, one of our options will be to adapt this process to migrate from old versions of our database to new ones. We'll have to write completely new queries, but they should be very easy, because the source database is our own, and the mapping from it to the loader scripts is, as you would imagine, straightforward, even as the new version of the system drifts away from the live version. This would let us keep working in the complete rebuild paradigm - we still wouldn't have to worry about ALTER TABLE or keeping our databases clean, even when we're doing maintenance. I have no idea what the operations team will think of this idea, though!