bash, Linux: Set difference between two text files
Somebody showed me how to do exactly this in sh a couple months ago, and then I couldn't find it for a while... and while looking I stumbled onto your question. Here it is :
set_union () { sort $1 $2 | uniq}set_difference () { sort $1 $2 $2 | uniq -u}set_symmetric_difference() { sort $1 $2 | uniq -u}
Use comm
- it will compare two sorted files line by line.
The short answer to your question
This command will return lines unique to deleteNodes, and not in keepNodes.
comm -1 -3 <(sort keepNodes) <(sort deleteNodes)
Example setup
Let's create the files named keepNodes
and deleteNodes
, and use them as unsorted input for the comm
command.
$ cat > keepNodes <(echo bob; echo amber;)$ cat > deleteNodes <(echo bob; echo ann;)
By default, running comm without arguments prints 3 columns with this layout:
lines_unique_to_FILE1 lines_unique_to_FILE2 lines_which_appear_in_both
Using our example files above, run comm without arguments. Note the three columns.
$ comm <(sort keepNodes) <(sort deleteNodes)amber ann bob
Suppressing column output
Suppress column 1, 2 or 3 with -N; note that when a column is hidden, the whitespace shrinks up.
$ comm -1 <(sort keepNodes) <(sort deleteNodes)ann bob$ comm -2 <(sort keepNodes) <(sort deleteNodes)amber bob$ comm -3 <(sort keepNodes) <(sort deleteNodes)amber ann$ comm -1 -3 <(sort keepNodes) <(sort deleteNodes)ann$ comm -2 -3 <(sort keepNodes) <(sort deleteNodes)amber$ comm -1 -2 <(sort keepNodes) <(sort deleteNodes)bob
Sorting is important!
If you execute comm without first sorting the file, it fails gracefully with a message about which file is not sorted.
comm: file 1 is not in sorted order