Best practice for a mysql data versioning system Best practice for a mysql data versioning system database database

Best practice for a mysql data versioning system


My choice would be a variation of approach 2. Bold indicates fields in the primary key.

  • You insert every article in a table articles_versioned (id, timestamp, name, text)
  • Your second table is articles (id, timestamp, [name, text]). Note how timestamp is not primary; name and text may be replicated, or you may use a join with articles_versioned (which will be fast since id and timestamp are the articles_versioned primary key)
  • articles_versioned has a trigger on insert that takes the just inserted row and replicates it on articles
  • To restore a specific version of an article you modify the articles table.

The advantages of this approach are:

  1. You get for free another information (the date and time of the article) in your table, that you may need anyway
  2. You do not need to query the database to get the current date. If you use version, you have to.
  3. Your code doesn't have to insert the article in two tables. You simply insert in articles_versioned and read from articles, the db takes care of migrating data as you insert it via the trigger, avoiding any consistency problems.

Con's

  1. In an heavily concurrent environment, two versions may be inserted at the very same time, so one of them may fail. This shouldn't be a problem when inserting user-written articles (it is highly unlikely given precision of timestamps these days). If you don't specify the timestamp in your INSERT statement, but instead you set the datetime field to have the current time as a default value, you may avoid this problem entirely.

To answer the rest of your question. Approach 1 will not lead to longer queries as long as you add an index on status. This makes sense only if you tend to have many different versions of each article; as long as you have 2 versions per article on average or less, the index will only slow you down, and approach 2 would not be sensibly faster anyway (altough I'd still recommend my approach because it simplyfies code, since restoring a version does not require switching status for two rows).

Related resources, like images, should follow a similar versioning. I assume you are saving them on the filesystem; instead of saving them with their real name, use a table (id, image_name) to give to each image an id, then save the image as -id-.jpg. The image_name field will make you able to know what the original file name was (if you care about that). This way you can version images the same way as you version articles, and in articles you would use something like <img src="-id-.jpg">, that you know will remain available forever.