How can I diff and patch/merge strings instead of files? How can I diff and patch/merge strings instead of files? git git

How can I diff and patch/merge strings instead of files?


I have done quite a bit of searching for a solution for this. Python's difflib is fairly legit, but unfortunately it tends to require that the diff strings contain the entire original strings with records of what was changed. This differs from, say, a git diff, where you only see what was changed and some extra context. difflib also provides a function called unified_diff which does indeed provide a shorter diff, but it doesn't provide a function for rebuilding a string from a string and a diff. Eg. if I made a diff out of text1 and text2, called diff1, then I couldn't generate text2 out of text1 and diff1.

I have therefore made a simple Python module that allows for strings to be rebuilt, both forwards and backwards, from a single string and its related diffs. It's called merge_in_memory, and can be found at https://github.com/danielmoniz/merge_in_memory. Simply pull the repository and run the setup.py.

A simple example of its usage:

import merge_in_memory as mim_modulestr1 = """line 1line 2"""str2 = """line 1line 2 changed"""merger = mim_module.Merger()print merger.diff_make(str1, str2)

This will output:

--- +++ @@ -1,2 +1,2 @@ line 1-line 2+line 2 changed

diffs are simply strings (rather tan generators, as they are when using difflib).You can create a number of diffs and apply them at once (ie. fast-forward through a history or track back) with the diff_apply_bulk() function.

To reverse into the history, simply ensure that the reverse attribute is set to True when calling either diff_bulk() or diff_apply_bulk. For example:

merge = self.inline_merge.diff_apply_bulk(text3, [diff1, diff2], reverse=True)

If you start with text1 and generated text2 and text3 with diff1 and diff2, then text1 is rebuilt with the above line of code. Note that the list of diffs are still in ascending order. A 'merge', ie. applying a diff to a string, is itself a string.

All of this allows me to store diffs in the database as simple VARCHARs (or what-have-you). I can pull them out in order and apply them in either direction to generate the text I want, as long as I have a starting point.

Please feel free to leave any comments about this, as it is my first Python module.

Thanks,

ParagonRG


Have a look at libgit. It is a C (and every other language) interface that lets you manipulate a git repository in various ways.

It seems pretty low-level so getting it to actually commit, diff and so on might be tedious, but it does at least have a function to add a blob to the repo without it needing to be on disk.

The alternative of course is to create a normal file-based repository and working copy and bounce stuff back and forth between the database and file system using os.system calls.