How to find set difference of two files? How to find set difference of two files? bash bash

How to find set difference of two files?


The BashFAQ describes doing exactly this with comm, which is the canonically correct method.

# Subtraction of file1 from file2# (i.e., only the lines unique to file2)comm -13 <(sort file1) <(sort file2)

diff is less appropriate for this task, as it tries to operate on blocks rather than individual lines; as such, the algorithms it has to use are more complex and less memory-efficient.

comm has been part of the Single Unix Specification since SUS2 (1997).


If you simply want lines that are in file A, but not in B, you can sort the files, and compare them with diff.

sort A > A.sortedsort B > B.sorteddiff -u A.sorted B.sorted | grep '^-'


The 'diff' program is standard unix program that looks at differences between files.

% cat Aabcd% cat Babe% diff A B3,4c3< c< d---> e

With a simple grep and cut one can select the lines in A, not in B. Note that the cut is rather simplistic and spaces in the lines would throw it off... but the concept is there.

% diff A B | grep '^<' | cut -f2 -d" "cd