Is there a concurrent-access data storage format that can be version controlled? Is there a concurrent-access data storage format that can be version controlled? database database

Is there a concurrent-access data storage format that can be version controlled?


One solution that comes to my mind is to convert your data into a format that support versioning control. For instance XML. Then when you want restore your data to a specific version, just checkout the corresponfing XML and do the oposite.

There is also an XML based database engine: BaseX

Taken from BaseX site:

BaseX is a light-weight, high-performance and scalable XML Database engine and XPath/XQuery 3.0 Processor, which includes full support for the W3C Update and Full Text extensions. An interactive and user-friendly GUI frontend gives you great insight into your XML documents.

You can take a look at this too (Oracle): Generating XML Data from the Database


Your basic requirements are easy to meet, as @RaydelMiranda already illustrated. For completeness, there exist (mostly young) query language implementations for JSON as well:

You could just store your data in JSON, manage the JSON file with a DVCS of your choice and use one of the above languages to query the data. You can also add basic relationship information to any format by just giving records IDs and referring to those IDs from other records.

As far as I know, there is no tool in existence that at the same time also meets either of the "ideally" requirements (Liquibase as mentioned by @LazyBadger looks interesting, but it doesn't version the database itself). However, it is reasonably easy to add safety of concurrent access to the data file by yourself using locks. Note that concurrent use of the version control is already safe out of the box, if you use a stock (D)VCS.

Relationship integrity is the hardest one to tackle, especially if you stick with the paradigm "create a database file and stick it in a version control system". It would almost certainly require you to do something difficult and hacky, e.g. extending an existing XML/JSON query system to keep the records sorted and to keep an in-memory lookup table to raw file locations (which would be necessary to keep the speed acceptable), or modifying SQLite to use a plain text file format with newlines.

You could also turn the paradigm inside-out: implement version control within an RDBMS like SQLite. That would mostly be a matter of defining a "parent version" relationship and never deleting or updating records, only inserting new versions. To synchronize changes between diverged copies of a database, just exchange all newly inserted records since the last sync and the parent relationship will take care of the rest (modulo some sophistications to prevent primary key conflicts). This is also how stock DVCSs like Git do it. As an added bonus, SQLite and other RDBMSs are already safe to access concurrently.


Note

All up-to-date SCMs have atomic commits, you can just don't worry about it

Follow-up to @RaydelMiranda

From other side you can have SQL backend, but version and exchange only Alter|Update scripts for SQL backend (fullweight solution like Liquibase can be considered - I discovered just now, that it can be used with different backends: "Supports XML, YAML, JSON and SQL formats" and Rollback can be used for undoing database-changes)