Database Design for Revisions? Database Design for Revisions? database database

Database Design for Revisions?


I think the key question to ask here is 'Who / What is going to be using the history'?

If it's going to be mostly for reporting / human readable history, we've implemented this scheme in the past...

Create a table called 'AuditTrail' or something that has the following fields...

[ID] [int] IDENTITY(1,1) NOT NULL,[UserID] [int] NULL,[EventDate] [datetime] NOT NULL,[TableName] [varchar](50) NOT NULL,[RecordID] [varchar](20) NOT NULL,[FieldName] [varchar](50) NULL,[OldValue] [varchar](5000) NULL,[NewValue] [varchar](5000) NULL

You can then add a 'LastUpdatedByUserID' column to all of your tables which should be set every time you do an update / insert on the table.

You can then add a trigger to every table to catch any insert / update that happens and creates an entry in this table for each field that's changed. Because the table is also being supplied with the 'LastUpdateByUserID' for each update / insert, you can access this value in the trigger and use it when adding to the audit table.

We use the RecordID field to store the value of the key field of the table being updated. If it's a combined key, we just do a string concatenation with a '~' between the fields.

I'm sure this system may have drawbacks - for heavily updated databases the performance may be hit, but for my web-app, we get many more reads than writes and it seems to be performing pretty well. We even wrote a little VB.NET utility to automatically write the triggers based on the table definitions.

Just a thought!


  1. Do not put it all in one table with an IsCurrent discriminator attribute. This just causes problems down the line, requires surrogate keys and all sorts of other problems.
  2. Design 2 does have problems with schema changes. If you change the Employees table you have to change the EmployeeHistories table and all the related sprocs that go with it. Potentially doubles you schema change effort.
  3. Design 1 works well and if done properly does not cost much in terms of a performance hit. You could use an xml schema and even indexes to get over possible performance problems. Your comment about parsing the xml is valid but you could easily create a view using xquery - which you can include in queries and join to. Something like this...
CREATE VIEW EmployeeHistoryAS, FirstName, , DepartmentIdSELECT EmployeeId, RevisionXML.value('(/employee/FirstName)[1]', 'varchar(50)') AS FirstName,  RevisionXML.value('(/employee/LastName)[1]', 'varchar(100)') AS LastName,  RevisionXML.value('(/employee/DepartmentId)[1]', 'integer') AS DepartmentId,FROM EmployeeHistories 


The History Tables article in the Database Programmer blog might be useful - covers some of the points raised here and discusses the storage of deltas.

Edit

In the History Tables essay, the author (Kenneth Downs), recommends maintaining a history table of at least seven columns:

  1. Timestamp of the change,
  2. User that made the change,
  3. A token to identify the record that was changed (where the history is maintained separately from the current state),
  4. Whether the change was an insert, update, or delete,
  5. The old value,
  6. The new value,
  7. The delta (for changes to numerical values).

Columns which never change, or whose history is not required, should not be tracked in the history table to avoid bloat. Storing the delta for numerical values can make subsequent queries easier, even though it can be derived from the old and new values.

The history table must be secure, with non-system users prevented from inserting, updating or deleting rows. Only periodic purging should be supported to reduce overall size (and if permitted by the use case).