Compare/Difference of two arrays in Bash
If you strictly want Array1 - Array2
, then
Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )Array3=()for i in "${Array1[@]}"; do skip= for j in "${Array2[@]}"; do [[ $i == $j ]] && { skip=1; break; } done [[ -n $skip ]] || Array3+=("$i")donedeclare -p Array3
Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.
For a symmetric difference like Dennis's answer, existing tools like comm
work, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).
Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from comm
back into an array.
$ oldIFS=$IFS IFS=$'\n\t'$ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))comm: file 1 is not in sorted order$ IFS=$oldIFS$ declare -p Array3declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'
It complains because, by lexographical sorting, key1 < … < key9 > key10
. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-order
to get rid of the warning, or add a | sort -u
inside the <(…)
process substitution if you can't guarantee order&uniqueness of the input arrays.
Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.
Code
#!/bin/bashdiff(){ awk 'BEGIN{RS=ORS=" "} {NR==FNR?a[$0]++:a[$0]--} END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")}Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )Array3=($(diff Array1[@] Array2[@]))echo ${Array3[@]}
Output
$ ./diffArray.shkey10 key7 key8 key9
*Note**: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.