Parse local HTML file
It appears that Invoke-WebRequest
loads file
protocol URIs just fine, but fails to parse them even in PowerShell 4.0 (where it is officially supported).
An alternative that does not require setting up a website would be to load and parse HTML directly into MSHTML.
$html = New-Object -ComObject "HTMLFile";$source = Get-Content -Path "file.html" -Raw;$html.IHTMLDocument2_write($source);$html.links.length;
Note that when I tested this, a single
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
header prevented my HTML from parsing and I have no idea why -- the document had similar XHTML-style headers and MSHTML had no issues with those.
You can use the file with aweb serverto get around the dumb limitation of Invoke-WebRequest
PS > $foo = Invoke-WebRequest http://localhost:8080/example.htmPS > $foo.Links.Count1
Note this will work even with no connection, example
PS > Invoke-WebRequest http://example.comInvoke-WebRequest : The remote name could not be resolved: 'example.com'
Use file-link format
$foo = Invoke-WebRequest "file://<path-to-file>"
Fix my mistake
If html is valid xml then you can use select-xml:
[xml]$html = Get-Content '<path_to_html_file>'Select-Xml $html -XPath '//a' | foreach {$_.node}