diff files comparing only first n characters of each line
Easy starter:
diff <(cut -d' ' -f1 md5s1.txt) <(cut -d' ' -f1 md5s2.txt)
Also, consider just
diff -EwburqN folder1/ folder2/
Compare only the md5 column using diff
on <(cut -c -32 md5sums.sort.XXX)
, and tell diff
to print just the line numbers of added or removed lines, using --old/new-line-format='%dn'$'\n'
. Pipe this into ed md5sums.sort.XXX
so it will print only those lines from the md5sums.sort.XXX
file.
diff \ --new-line-format='%dn'$'\n' \ --old-line-format='' \ --unchanged-line-format='' \ <(cut -c -32 md5sums.sort.old) \ <(cut -c -32 md5sums.sort.new) \ | ed md5sums.sort.new \ > files-addeddiff \ --new-line-format='' \ --old-line-format='%dn'$'\n' \ --unchanged-line-format='' \ <(cut -c -32 md5sums.sort.old) \ <(cut -c -32 md5sums.sort.new) \ | ed md5sums.sort.old \ > files-removed
The problem with ed
is that it will load the entire file into memory, which can be a problem if you have a lot of checksums. Instead of piping the output of diff into ed
, pipe it into the following command, which will use much less memory.
diff … | ( lnum=0; while read lprint; do while [ $lnum -lt $lprint ]; do read line <&3; ((lnum++)); done; echo $line; done) 3<md5sums.sort.XXX
If you are looking for duplicate files fdupes can do this for you:
$ fdupes --recurse
On ubuntu you can install it by doing
$ apt-get install fdupes