Using querySelectorAll on an mshtml.HTMLDocumentClass object in PowerShell causes a crash Using querySelectorAll on an mshtml.HTMLDocumentClass object in PowerShell causes a crash powershell powershell

Using querySelectorAll on an mshtml.HTMLDocumentClass object in PowerShell causes a crash


I ran into this problem, too, and posted about it on reddit. I believe the problem happens when Powershell tries to enumerate the HTML DOM NodeList object returned by querySelectorAll(). The same object is returned by childNodes() which can be enumerated by PS, so I'm guessing there's some glue code written for .ParsedHtml.childNodes but not .ParsedHtml.querySelectorAll(). The crash can be triggered by Intellisense trying to get tab-complete help for the object, too.

I found a way around it, though! Just access the native DOM methods .item() and .length directly and emit the node objects into a PowerShell array. The following code pulls the newest page of posts from /r/Powershell, gets the post list anchors via querySelectorAll() then manually enumerates them using the native DOM methods into a Powershell-native array.

$Result = Invoke-WebRequest -Uri "https://www.reddit.com/r/PowerShell/new/"$NodeList = $Result.ParsedHtml.querySelectorAll("#siteTable div div p.title a")$PsNodeList = @()for ($i = 0; $i -lt $NodeList.Length; $i++) {     $PsNodeList += $NodeList.item($i)}$PsNodeList | ForEach-Object {    $_.InnerHtml}

Edit .Length seems to work capitalized or lower-case. I would have expected the DOM to be case-sensitive, so either there's some things going on to help translate or I'm misunderstanding something. Also, the CSS selector is grabbing the source links (self.PowerShell mostly), but that it my CSS selector logic error, not a problem with querySelectorAll(). Note that the results of querySelectorAll() are not live, so modifying them won't modify the original DOM. And I haven't tried modifying them or using their methods yet, but clearly we can grab at the very least .InnerHtml.

Edit 2: Here is a more-generalized wrapper function:

function Get-FixedQuerySelectorAll {    param (        $HtmlWro,        $CssSelector    )    # After assignment, $NodeList will crash powershell if enumerated in any way including Intellisense-completion while coding!    $NodeList = $HtmlWro.ParsedHtml.querySelectorAll($CssSelector)    for ($i = 0; $i -lt $NodeList.length; $i++) {        Write-Output $NodeList.item($i)    }}

$HtmlWro is an HTML Web Response Object, the output of Invoke-WebReqest. I originally tried to pass .ParsedHtml but then it would crash on assignment. Doing it this way returns the nodes in a Powershell array.


The @midnightfreddie's solution worked fine for me before, but now it throws Exception from HRESULT: 0x80020101 when calling $NodeList.item($i).

I found the following workaround:

function Invoke-QuerySelectorAll($node, [string] $selector){    $nodeList = $node.querySelectorAll($selector)    $nodeListType = $nodeList.GetType()    $result = @()    for ($i = 0; $i -lt $nodeList.length; $i++)    {        $result += $nodeListType.InvokeMember("item", [System.Reflection.BindingFlags]::InvokeMethod, $null, $nodeList, $i)    }    return $result}

This one works for New-Object -ComObject InternetExplorer.Application as well.