bash, Linux: Set difference between two text files bash, Linux: Set difference between two text files bash bash

bash, Linux: Set difference between two text files


Somebody showed me how to do exactly this in sh a couple months ago, and then I couldn't find it for a while... and while looking I stumbled onto your question. Here it is :

set_union () {   sort $1 $2 | uniq}set_difference () {   sort $1 $2 $2 | uniq -u}set_symmetric_difference() {   sort $1 $2 | uniq -u}


Use comm - it will compare two sorted files line by line.

The short answer to your question

This command will return lines unique to deleteNodes, and not in keepNodes.

comm -1 -3 <(sort keepNodes) <(sort deleteNodes)

Example setup

Let's create the files named keepNodes and deleteNodes, and use them as unsorted input for the comm command.

$ cat > keepNodes <(echo bob; echo amber;)$ cat > deleteNodes <(echo bob; echo ann;)

By default, running comm without arguments prints 3 columns with this layout:

lines_unique_to_FILE1    lines_unique_to_FILE2        lines_which_appear_in_both

Using our example files above, run comm without arguments. Note the three columns.

$ comm <(sort keepNodes) <(sort deleteNodes)amber    ann        bob

Suppressing column output

Suppress column 1, 2 or 3 with -N; note that when a column is hidden, the whitespace shrinks up.

$ comm -1 <(sort keepNodes) <(sort deleteNodes)ann    bob$ comm -2 <(sort keepNodes) <(sort deleteNodes)amber    bob$ comm -3 <(sort keepNodes) <(sort deleteNodes)amber    ann$ comm -1 -3 <(sort keepNodes) <(sort deleteNodes)ann$ comm -2 -3 <(sort keepNodes) <(sort deleteNodes)amber$ comm -1 -2 <(sort keepNodes) <(sort deleteNodes)bob

Sorting is important!

If you execute comm without first sorting the file, it fails gracefully with a message about which file is not sorted.

comm: file 1 is not in sorted order