Unable to completely parse XML in PowerShell Unable to completely parse XML in PowerShell powershell powershell

Unable to completely parse XML in PowerShell


XML is a structured text format. It knows nothing about "folders". What you see in your screenshots is just how the the data is rendered by program you use for displaying it.

Anyway, the best approach to get what you want is using SelectNodes() with an XPath expression. As usual.

[xml]$xml = Get-Content 'X:\folder\my.xml'$xml.SelectNodes('//Product/Item[@Class="Patch"]') |    Select-Object BulletinID, PatchName, Status


tl;dr

As you suspected, a name collision prevented prevented access to the .Item property on the XML elements of interest; fix the problem with explicit enumeration of the parent elements:

$xml.PatchScan.Machine.Product | % { $_.Item | select BulletinId, PatchName, Status }

% is a built-in alias for the ForEach-Object cmdlet; see bottom section for an explanation.


As an alternative, Ansgar Wiecher's helpful answer offers a concise XPath-based solution, which is both efficient and allows sophisticated queries.

As an aside: PowerShell v3+ comes with the Select-Xml cmdlet, which takes a file path as an argument, allowing for a single-pipeline solution:

(Select-Xml -LiteralPath X:\folder\my.xml '//Product/Item[@Class="Patch"]').Node |  Select-Object BulletinId, PatchName, Status

Select-Xml wraps the matching XML nodes in an outer object, hence the need to access the .Node property.


PowerShell's adaptation of the XML DOM (dot notation):

PowerShell decorates the object hierarchy contained in [System.Xml.XmlDocument] instances (created with cast [xml], for instance):

  • with properties named for the input document's specific elements and attributes[1] at every level; e.g.:

     ([xml] '<foo><bar>baz</bar></foo>').foo.bar # -> 'baz' ([xml] '<foo><bar id="1" /></foo>').foo.bar.id # -> '1'
  • turning multiple elements of the same name at a given hierarchy level implicitly into arrays (specifically, of type [object[]]); e.g.:

     ([xml] '<foo><C>one</C><C>two</C></foo>').foo.C[1] # -> 'two'

As the examples (and your own code in the question) show, this allows for access via convenient dot notation.

Note: If you use dot notation to target an element that has at least one attribute and/or child elements, the element itself is returned (an XmlElement instance); otherwise, it is the element's text content; for information about updating XML documents via dot notation, see this answer.

The downside of dot notation is that there can be name collisions, if an incidental input-XML element name happens to be the same as either an intrinsic [System.Xml.XmlElement] property name (for single-element properties), or an intrinsic [Array] property name (for array-valued properties; [System.Object[]] derives from [Array]).

In the event of a name collision: If the property being accessed contains:

  • a single child element ([System.Xml.XmlElement]), the incidental properties win.

    • This too can be problematic, because it makes accessing intrinsic type properties unpredictable - see bottom section.
  • an array of child elements, the [Array] type's properties win.

    • Therefore, the following element names break dot notation with array-valued properties (obtained with reflection command
      Get-Member -InputObject 1, 2 -Type Properties, ParameterizedProperty):

          Item Count IsFixedSize IsReadOnly IsSynchronized Length LongLenth Rank SyncRoot

See the last section for a discussion of this difference and for how to gain access to the intrinsic [System.Xml.XmlElement] properties in the event of a collision.

The workaround is to use explicit enumeration of array-valued properties, using the ForEach-Object cmdlet, as demonstrated at the top.
Here is a complete example:

[xml] $xml = @'<PatchScan>  <Machine>    <Product>      <Name>Windows 10 Pro (x64)</Name>      <Item Class="Patch">        <BulletinId>MSAF-054</BulletinId>        <PatchName>windows10.0-kb3189031-x64.msu</PatchName>        <Status>Installed</Status>      </Item>      <Item Class="Patch">        <BulletinId>MSAF-055</BulletinId>        <PatchName>windows10.0-kb3189032-x64.msu</PatchName>        <Status>Not Installed</Status>      </Item>    </Product>    <Product>      <Name>Windows 7 Pro (x86)</Name>      <Item Class="Patch">        <BulletinId>MSAF-154</BulletinId>        <PatchName>windows7-kb3189031-x86.msu</PatchName>        <Status>Partly Installed</Status>      </Item>      <Item Class="Patch">        <BulletinId>MSAF-155</BulletinId>        <PatchName>windows7-kb3189032-x86.msu</PatchName>        <Status>Uninstalled</Status>      </Item>    </Product>  </Machine></PatchScan>'@# Enumerate the array-valued .Product property explicitly, so that# the .Item property can successfully be accessed on each XmlElement instance.$xml.PatchScan.Machine.Product |   ForEach-Object { $_.Item | Select-Object BulletinID, PatchName, Status }

The above yields:

Class BulletinId PatchName                     Status          ----- ---------- ---------                     ------          Patch MSAF-054   windows10.0-kb3189031-x64.msu Installed       Patch MSAF-055   windows10.0-kb3189032-x64.msu Not Installed   Patch MSAF-154   windows7-kb3189031-x86.msu    Partly InstalledPatch MSAF-155   windows7-kb3189032-x86.msu    Uninstalled     

Further down the rabbit hole: What properties are shadowed when:

Note: By shadowing I mean that in the case of a name collision, the "winning" property - the one whose value is reported - effectively hides the other one, thereby "putting it in the shadow".


In the case of using dot notation with arrays, a feature called member enumeration comes into play, which applies to any collection in PowerShell v3+; in other words: the behavior is not specific to the [xml] type.

In short: accessing a property on a collection implicitly accesses the property on each member of the collection (item in the collection) and returns the resulting values as an array ([System.Object[]]); .e.g:

# Using member enumeration, collect the value of the .prop property from# the array's individual *members*.> ([pscustomobject] @{ prop = 10 }, [pscustomobject] @{ prop = 20 }).prop1020

However, if the collection type itself has a property by that name, the collection's own property takes precedence; e.g.:

# !! Since arrays themselves have a property named .Count,# !! member enumeration does NOT occur here.> ([pscustomobject] @{ count = 10 }, [pscustomobject] @{ count = 20 }).Count2  # !! The *array's* count property was accessed, returning the count of elements

In the case of using dot notation with [xml] (PowerShell-decorated System.Xml.XmlDocument and System.Xml.XmlElement instances), the PowerShell-added, incidental properties shadow the type-intrinsic ones:[2]

While this behavior is easy to grasp, the fact that the outcome depends on the specific input can also be treacherous:

For instance, in the following example the incidental name child element shadows the intrinsic property of the same name on the element itself:

> ([xml] '<xml><child>foo</child></xml>').xml.Namexml  # OK: The element's *own* name> ([xml] '<xml><name>foo</name></xml>').xml.Namefoo  # !! .name was interpreted as the incidental *child* element

If you do need to gain access to the intrinsic type's properties, use .get_<property-name>():

> ([xml] '<xml><name>foo</name></xml>').xml.get_Name()xml  # OK - intrinsic property value to use of .get_*()

[1] If a given element has both an attribute and and element by the same name, PowerShell reports both, as the elements of an array [object[]].

[2] Seemingly, when PowerShell adapts the underlying System.Xml.XmlElement type behind the scenes, it doesn't expose its properties as such, but via get_* accessor methods, which still allows access as if they were properties, but with the PowerShell-added incidental-but-bona-fide properties taking precedence. Do let us know if you know more about this.