Why diff with ignore matching lines doesn't work as expected? Why diff with ignore matching lines doesn't work as expected? shell shell

Why diff with ignore matching lines doesn't work as expected?


This behaviour is normal given the way diff works (as of April 2013).

diff is line oriented, it means that a line is either considered totally different or totally equivalent. When a line is ignored, it is entered into the list of different lines before comparison, and when the change script is computed, changes made only of ignored lines are considered themselves as ignored. When ignored lines are adjacent to changed lines, it makes up a single non-ignored change.

The problem lies in the inability of diff to understand that consecutive lines are not related: you are not diffing a sequence of text (what diff is aimed at), but rather a list of independent lines which are keyed (Tab >= <key>). These problems seem pretty similar when both files are generated in the same order, but still not the same.


This behaviour looks a bit weird indeed. I noticed something by tweaking your input files (I just moved the "Memory" line to the top on both files) :

file1.txt

###################################################Dump stat Title information for 'ssummary' view###################################################Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}Tab=> 'Instance' Title=> {text {Total instances: 7831}}Tab=> 'Device' Title=> {text {Total spice devices: 256}}Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}

file2.txt

###################################################Dump stat Title information for 'ssummary' view###################################################Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}Tab=> 'Instance' Title=> {text {Total instances: 7831}}Tab=> 'Device' Title=> {text {Total spice devices: 256}}Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

A plain diff will give you :

diff file1.txt file2.txt4c4< Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}---> Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}7c7< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}---> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

Notice that there are two sets of differences now... with that arrangement, the diff -I 'Memory' file1.txt file2.txt command will work and output this :

7c7< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}---> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

Meaning, the -I flag seems to work only when every line in a set of differences matches the expression. I don't know if this is a bug or expected behaviour... but it's certainly inconsistent.


EDIT : actually, as per the GNU diff documentation, it IS the expected behavior. The man page is not so clear. OpenBSD diff has a -I flag too, but their man page explains it better.


Well you learn something new every day. I was equally confused and frustrated by this behaviour, which seems to be roughly [diff the input files, then filter out the RE] rather than [filter the RE out of the input files, then diff].

I would have thought the second approach more natural and more useful. For instance this seems to be the way --ignore-case and --strip-trailing-cr work, adjusting the input files before diffing. Additionally, actually achieving what the questioner wanted involves filtering both inputs to temp files, diffing them, then removing them. It becomes even more tedious if you want to do a recursive diff as I did.

I acknowledge that diff behaves the way it's documented rather than how I want it to behave, but respectfully suggest that this option (and similar for -b, -w too) could usefully be added to diff.