Generating and applying diffs in python Generating and applying diffs in python python python

Generating and applying diffs in python


Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.

A python version is included.

http://code.google.com/p/google-diff-match-patch/


Does difflib.unified_diff do want you want? There is an example here.


I've implemented a pure python function to apply diff patches to recover either of the input strings, I hope someone finds it useful. It uses parses the Unified diff format.

import re_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")def apply_patch(s,patch,revert=False):  """  Apply unified diff patch to string s to recover newer string.  If revert is True, treat s as the newer string, recover older string.  """  s = s.splitlines(True)  p = patch.splitlines(True)  t = ''  i = sl = 0  (midx,sign) = (1,'+') if not revert else (3,'-')  while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines  while i < len(p):    m = _hdr_pat.match(p[i])    if not m: raise Exception("Cannot process diff")    i += 1    l = int(m.group(midx))-1 + (m.group(midx+1) == '0')    t += ''.join(s[sl:l])    sl = l    while i < len(p) and p[i][0] != '@':      if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2      else: line = p[i]; i += 1      if len(line) > 0:        if line[0] == sign or line[0] == ' ': t += line[1:]        sl += (line[0] != sign)  t += ''.join(s[sl:])  return t

If there are header lines ("--- ...\n","+++ ...\n") it skips over them. If we have a unified diff string diffstr representing the diff between oldstr and newstr:

# recreate `newstr` from `oldstr`+patchnewstr = apply_patch(oldstr, diffstr)# recreate `oldstr` from `newstr`+patcholdstr = apply_patch(newstr, diffstr, True)

In Python you can generate a unified diff of two strings using difflib (part of the standard library):

import difflib_no_eol = "\ No newline at end of file"def make_patch(a,b):  """  Get unified string diff between two strings. Trims top two lines.  Returns empty string if strings are identical.  """  diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)  try: _,_ = next(diffs),next(diffs)  except StopIteration: pass  return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

On unix: diff -U0 a.txt b.txt

Code is on GitHub here along with tests using ASCII and random unicode characters: https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc