Comparing two arrays & get the values which are not common Comparing two arrays & get the values which are not common arrays arrays

Comparing two arrays & get the values which are not common


PS > $c = Compare-Object -ReferenceObject (1..5) -DifferenceObject (1..6) -PassThruPS > $c6


Collection

$a = 1..5$b = 4..8

$Yellow = $a | Where {$b -NotContains $_}

$Yellow contains all the items in $a except the ones that are in $b:

PS C:\> $Yellow123

$Blue = $b | Where {$a -NotContains $_}

$Blue contains all the items in $b except the ones that are in $a:

PS C:\> $Blue678

$Green = $a | Where {$b -Contains $_}

Not in question, but anyways; Green contains the items that are in both $a and $b.

PS C:\> $Green45

Note: Where is an alias of Where-Object. Alias can introduce possible problems and make scripts hard to maintain.


Addendum 12 October 2019

As commented by @xtreampb and @mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).

Union

The symmetric difference between the $a and $b can be literally defined as the union of $Yellow and $Blue:

$NotGreen = $Yellow + $Blue

Which is written out:

$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})

Performance

As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a iterate (using Where) through items in list $b (using -NotContains) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b):

$Count = @{}$a + $b | ForEach-Object {$Count[$_] += 1}$Count.Keys | Where-Object {$Count[$_] -eq 1}

By using the ForEach statement instead of the ForEach-Object cmdlet and the Where method instead of the Where-Object you might increase the performance by a factor 2.5:

$Count = @{}ForEach ($Item in $a + $b) {$Count[$Item] += 1}$Count.Keys.Where({$Count[$_] -eq 1})

LINQ

But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:

To use LINQ you need to explicitly define the array types:

[Int[]]$a = 1..5[Int[]]$b = 4..8

And use the [Linq.Enumerable]:: operator:

$Yellow   = [Int[]][Linq.Enumerable]::Except($a, $b)$Blue     = [Int[]][Linq.Enumerable]::Except($b, $a)$Green    = [Int[]][Linq.Enumerable]::Intersect($a, $b)$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))

Benchmark

Benchmark results highly depend on the sizes of the collections and how many items there are actually shared, as a "average", I am presuming that half of each collection is shared with the other.

Using             TimeCompare-Object    111,9712NotContains       197,3792ForEach-Object    82,8324ForEach Statement 36,5721LINQ              22,7091

To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.

$a = 1..1000$b = 500..1500(Measure-Command {    Compare-Object -ReferenceObject $a -DifferenceObject $b  -PassThru}).TotalMilliseconds(Measure-Command {    ($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})}).TotalMilliseconds(Measure-Command {    $Count = @{}    $a + $b | ForEach-Object {$Count[$_] += 1}    $Count.Keys | Where-Object {$Count[$_] -eq 1}}).TotalMilliseconds(Measure-Command {    $Count = @{}    ForEach ($Item in $a + $b) {$Count[$Item] += 1}    $Count.Keys.Where({$Count[$_] -eq 1})}).TotalMilliseconds[Int[]]$a = $a[Int[]]$b = $b(Measure-Command {    [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))}).TotalMilliseconds


Look at Compare-Object

Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }

Or if you would like to know where the object belongs to, then look at SideIndicator:

$a1=@(1,2,3,4,5,8)$b1=@(1,2,3,4,5,6)Compare-Object $a1 $b1