Database design for text revisions Database design for text revisions database database

Database design for text revisions


I develop a wiki engine and page/article revisions are stored in a database table. Each revision has a sequential revision number, while the "current" revision is marked with -1 (just to avoid NULL).

Revision text is stored as-is, not diffed or something like that.

I think that performance is not a problem because you are not likely to access older revisions very frequently.


Given the current state of HDD art, it just does not worth the effort trying to optimize text storage mechanisms: Document (ID, Name) and DocumentRevision (ID, DocumentID, Contents) tables will do the job. the ID in DocumentRevision may also serve as a "repository"-wide revision number. If this is not the behavior you want, assign a separate VersionID to each Document Revision.


Often the most sensible way of tracking the versions of a document is to keep track of the changes made to it. Then, if a particular version is requested it can be rebuilt from the current document and the partial set of changes.

So if you have a good method of describing the types of changes to a document (this will depend largely on what the document is and how it used) then by all means use a database to track the changes and therefore the versions.