Percentage value with GNU Diff

Something like this perhaps?

Two files, A1 and A2.

$ sdiff -B -b -s A1 A2 | wc would give you how many lines differed. wc gives total, just divide.

The -b and -B are to ignore blanks and blank lines, and -s says to suppress the common lines.

https://superuser.com/questions/347560/is-there-a-tool-to-measure-file-difference-percentage has a neat solution for this,

wdiff -s file1.txt file2.txt

more options see man wdiff.

linux unix diff

Here's a script that will compare all .txt files and display the ones that have more than 15% duplication:

#!/bin/bash# walk through all files in the current dir (and subdirs)# and compare them with other files, showing percentage# of duplication.# which type files to compare?# (wouldn't make sense to compare binary formats)ext="txt"# support filenames with spaces:IFS=$(echo -en "\n\b")working_dir="$PWD"working_dir_name=$(echo $working_dir | sed 's|.*/||')all_files="$working_dir/../$working_dir_name-filelist.txt"remaining_files="$working_dir/../$working_dir_name-remaining.txt"# get information about files:find -type f -print0 | xargs -0 stat -c "%s %n" | grep -v "/\." | \     grep "\.$ext" | sort -nr > $all_filescp $all_files $remaining_fileswhile read string; do    fileA=$(echo $string | sed 's/.[^.]*\./\./')    tail -n +2 "$remaining_files" > $remaining_files.temp    mv $remaining_files.temp $remaining_files    # remove empty lines since they produce false positives    sed '/^$/d' $fileA > tempA    echo Comparing $fileA with other files...    while read string; do        fileB=$(echo $string | sed 's/.[^.]*\./\./')        sed '/^$/d' $fileB > tempB        A_len=$(cat tempA | wc -l)        B_len=$(cat tempB | wc -l)        differences=$(sdiff -B -s tempA tempB | wc -l)        common=$(expr $A_len - $differences)        percentage=$(echo "100 * $common / $B_len" | bc)        if [[ $percentage -gt 15 ]]; then            echo "  $percentage% duplication in" \                 "$(echo $fileB | sed 's|\./||')"        fi    done < "$remaining_files"    echo " "done < "$all_files"rm tempArm tempBrm $all_filesrm $remaining_files

CodeHunter

Percentage value with GNU Diff

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last