Compare/Difference of two arrays in Bash Compare/Difference of two arrays in Bash arrays arrays

Compare/Difference of two arrays in Bash


echo ${Array1[@]} ${Array2[@]} | tr ' ' '\n' | sort | uniq -u

Output

key10key7key8key9

You can add sorting if you need


If you strictly want Array1 - Array2, then

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )Array3=()for i in "${Array1[@]}"; do    skip=    for j in "${Array2[@]}"; do        [[ $i == $j ]] && { skip=1; break; }    done    [[ -n $skip ]] || Array3+=("$i")donedeclare -p Array3

Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.


For a symmetric difference like Dennis's answer, existing tools like comm work, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).

Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from comm back into an array.

$ oldIFS=$IFS IFS=$'\n\t'$ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))comm: file 1 is not in sorted order$ IFS=$oldIFS$ declare -p Array3declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'

It complains because, by lexographical sorting, key1 < … < key9 > key10. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-order to get rid of the warning, or add a | sort -u inside the <(…) process substitution if you can't guarantee order&uniqueness of the input arrays.


Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.

Code

#!/bin/bashdiff(){  awk 'BEGIN{RS=ORS=" "}       {NR==FNR?a[$0]++:a[$0]--}       END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")}Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )Array3=($(diff Array1[@] Array2[@]))echo ${Array3[@]}

Output

$ ./diffArray.shkey10 key7 key8 key9

*Note**: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.