Decode or unescape \u00f0\u009f\u0091\u008d to ๐Ÿ‘ Decode or unescape \u00f0\u009f\u0091\u008d to ๐Ÿ‘ powershell powershell

Decode or unescape \u00f0\u009f\u0091\u008d to ๐Ÿ‘


The Unicode code point of the ๐Ÿ‘character is U+1F44D.

Using the variable-length UTF-8 encoding, the following 4 bytes (expressed as hex. numbers) are needed to represent this code point: F0 9F 91 8D.

While these bytes are recognizable in your string,

$str = "\u00f0\u009f\u0091\u008d"

they shouldn't be represented as \u escape codes, because they're not Unicode code units / code point, they're bytes.

With a 4-hex-digit escape sequence (UTF-16), the proper representation would require 2 16-bit Unicode code units, a so-called surrogate pair, which together represent the single non-BMP code point U+1F44D:

$str = "\uD83D\uDC4D"

If your JSON input used such proper Unicode escapes, PowerShell would process the string correctly; e.g.:

'{ "str": "\uD83D\uDC4D" }' | ConvertFrom-Json > out.txt

If you examine file out.txt, you'll see something like:

str---๐Ÿ‘ 

(The output was sent to a file, because console windows wouldn't render the ๐Ÿ‘char. correctly, at least not without additional configuration; note that if you used PowerShell Core on Linux or macOS, however, terminal output would work.)


Therefore, the best solution would be to correct the problem at the source and use proper Unicode escapes (or even use the characters themselves, as long as the source supports any of the standard Unicode encodings).

If you really must parse the broken representation, try the following workaround (PSv4+), building on your own [regex]::Replace() technique:

$str = "A \u00f0\u009f\u0091\u008d for Mot\u00c3\u00b6rhead."[regex]::replace($str, '(?:\\u[0-9a-f]{4})+', { param($m)   $utf8Bytes = (-split ($m.Value -replace '\\u([0-9a-f]{4})', '0x$1 ')).ForEach([byte])  [text.encoding]::utf8.GetString($utf8Bytes)})

This should yield A ๐Ÿ‘ for Motรถrhead.

The above translates sequences of \u... escapes into the byte values they represent and interprets the resulting byte array as UTF-8 text.


To save the decoded string to a UTF-8 file, use ... | Set-Content -Encoding utf8 out.txt

Alternatively, in PSv5+, as Dennis himself suggests, you can make Out-File and therefore it's virtual alias, >, default to UTF-8 via PowerShell's global parameter-defaults hashtable:

$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'

Note, however, that on Windows PowerShell (as opposed to PowerShell Core) you'll get an UTF-8 file with a BOM in both cases - avoiding that requires direct use of the .NET framework: see Using PowerShell to write a file in UTF-8 without the BOM