Read UTF-8 files correctly with PowerShell Read UTF-8 files correctly with PowerShell powershell powershell

Read UTF-8 files correctly with PowerShell


If the file is supposed to be UTF8 why don't you try to read it decoding UTF8 :

Get-Content -Path test.txt -Encoding UTF8


Really JPBlanc is right. If you want it read as UTF8 then specify that when the file is read.

On a side note, you're losing formatting in here with the [String]+[String] stuff. Not to mention your regex match doesn't work. Check out the regex search changes, and the changes made to the $newMsgs, and the way I'm outputting your data to the file.

# Read data if exists$data = ""$startRev = 1;if (Test-Path test.txt){    $data = Get-Content -Path test.txt #-Encoding UTF8    if($data -match "\br([0-9]+)\b"){        $startRev = [int]([regex]::Match($data,"\br([0-9]+)\b")).groups[1].value + 1    }}Write-Host Next revision is $startRev# Define example data to add$startRev = $startRev + 10$newMsgs = @"2014-04-01 - r$startRev`r`n`r`n    Line 1`r`n    Line 2`r`n`r`n"@# Write new data back$newmsgs,$data | Out-File test.txt -Encoding UTF8


Get-Content doesn't seem to handle UTF-files without BOM at all (if you omit the Encoding-flag). System.IO.File.ReadLines seems to be an alternative, examples:

PS C:\temp\powershellutf8> $a = Get-Content .\utf8wobom.txtPS C:\temp\powershellutf8> $b = Get-Content .\utf8wbom.txtPS C:\temp\powershellutf8> $a2 = Get-Content .\utf8wbom.txt -Encoding UTF8PS C:\temp\powershellutf8> $aABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ  <== This doesnt seem to be right at allPS C:\temp\powershellutf8> $bABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖPS C:\temp\powershellutf8> $a2ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖPS C:\temp\powershellutf8>PS C:\temp\powershellutf8> $c = [IO.File]::ReadLines('.\utf8wbom.txt');PS C:\temp\powershellutf8> $cABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖPS C:\temp\powershellutf8> $d = [IO.File]::ReadLines('.\utf8wobom.txt');PS C:\temp\powershellutf8> $dABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ <== Works!