Using Git to track mysql schema - some questions Using Git to track mysql schema - some questions git git

Using Git to track mysql schema - some questions


Assuming you have a git repo already, do the following in a shell script or whatever:

#!/bin/bash -e# -e means exit if any command failsDBHOST=dbhost.yourdomain.comDBUSER=dbuserDBPASS=dbpass # do this in a more secure fashionDBNAME=dbnameGITREPO=/path/to/git/repocd $GITREPOmysqldump -h $DBHOST -u $DBUSER -p$DBPASS -d $DBNAME > $GITREPO/schema.sql # the -d flag means "no data"git add schema.sqlgit commit -m "$DBNAME schema version $(`date`)"git push # assuming you have a remote to push to

Then start this script on a daily basis from a cron job or what have you.

EDIT: By placing a script in $gitdir/hooks/pre-commit (the name is important), the script will be executed before every commit. This way the state of the DB schema is captured for each commit, which makes sense. If you automatically run this sql script every time you commit, you will blow away your database, which does not make sense.

#!/bin/sh

This line specifies that it's a shell script.

mysqldump -u DBUSER -pDBPASSWORD  DATABASE --no-data=true> SQLVersionControl/vc.sql

This is the same as in my answer above; taking the DDL only from the database and storing it in a file.

git add SQLVersionControl/vc.sql

This adds the SQL file to every commit made to your repository.

exit 0

This exits the script with success. This is possibly dangerous. If mysqldump or git add fails, you may blow away something you wanted to keep.


If you're just tracking the schema, put all of the CREATE statements into one .sql file, and add the file to git.

$> mkdir myschema && cd myschema$> git init$> echo "CREATE TABLE ..." > schema.sql$> git add schema.sql$> git commit -m "Initial import"


IMO the best approach is described here: http://viget.com/extend/backup-your-database-in-git. For your convenience I repeat the most important pieces here.

The trick is to use mysqldump --skip-extended-insert, which creates dumps that can be better tracked/diffed by git.

There are also some hints regarding the best repository configuration in order to reduce disk size. Copied from here:

  • core.compression = 9 : Flag for gzip to specify the compression level for blobs and packs. Level 1 is fast with larger file sizes, level 9 takes more time but results in better compression.
  • repack.usedeltabaseoffset = true : Defaults to false for compatibility reasons, but is supported with Git >=1.4.4.
  • pack.windowMemory = 100m : (Re)packing objects may consume lots of memory. To prevent all your resources go down the drain it's useful to put some limits on that. There is also pack.deltaCacheSize.
  • pack.window = 15 : Defaults to 10. With a higher value, Git tries harder to find similar blobs.
  • gc.auto = 1000 : Defaults to 6700. As indicated in the article it is recommended to run git gc every once in a while. Personally I run git gc --auto everyday, so only pack things when there's enough garbage. git gc --auto normally only triggers the packing mechanism when there are 6700 loose objects around. This flag lowers this amount.
  • gc.autopacklimit = 10: Defaults to 50. Every time you run git gc, a new pack is generated of the loose objects. Over time you get too many packs which waste space. It is a good idea to combine all packs once in a while into a single pack, so all objects can be combined and deltified. By default git gc does this when there are 50 packs around. But for this situation a lower number may be better.

Old versions can be pruned via:

git rebase --onto master~8 master~7

(copied from here)