Decode or unescape \u00f0\u009f\u0091\u008d to 👍

json facebook powershell utf-8 facebook-messenger

The Unicode code point of the 👍character is U+1F44D.

Using the variable-length UTF-8 encoding, the following 4 bytes (expressed as hex. numbers) are needed to represent this code point: F0 9F 91 8D.

While these bytes are recognizable in your string,

$str = "\u00f0\u009f\u0091\u008d"

they shouldn't be represented as \u escape codes, because they're not Unicode code units / code point, they're bytes.

With a 4-hex-digit escape sequence (UTF-16), the proper representation would require 2 16-bit Unicode code units, a so-called surrogate pair, which together represent the single non-BMP code point U+1F44D:

$str = "\uD83D\uDC4D"

If your JSON input used such proper Unicode escapes, PowerShell would process the string correctly; e.g.:

'{ "str": "\uD83D\uDC4D" }' | ConvertFrom-Json > out.txt

If you examine file out.txt, you'll see something like:

str---👍

(The output was sent to a file, because console windows wouldn't render the 👍char. correctly, at least not without additional configuration; note that if you used PowerShell Core on Linux or macOS, however, terminal output would work.)

Therefore, the best solution would be to correct the problem at the source and use proper Unicode escapes (or even use the characters themselves, as long as the source supports any of the standard Unicode encodings).

If you really must parse the broken representation, try the following workaround (PSv4+), building on your own [regex]::Replace() technique:

$str = "A \u00f0\u009f\u0091\u008d for Mot\u00c3\u00b6rhead."[regex]::replace($str, '(?:\\u[0-9a-f]{4})+', { param($m)   $utf8Bytes = (-split ($m.Value -replace '\\u([0-9a-f]{4})', '0x$1 ')).ForEach([byte])  [text.encoding]::utf8.GetString($utf8Bytes)})

This should yield A 👍 for Motörhead.

The above translates sequences of \u... escapes into the byte values they represent and interprets the resulting byte array as UTF-8 text.

To save the decoded string to a UTF-8 file, use ... | Set-Content -Encoding utf8 out.txt

Alternatively, in PSv5+, as Dennis himself suggests, you can make Out-File and therefore it's virtual alias, >, default to UTF-8 via PowerShell's global parameter-defaults hashtable:

$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'

Note, however, that on Windows PowerShell (as opposed to PowerShell Core) you'll get an UTF-8 file with a BOM in both cases - avoiding that requires direct use of the .NET framework: see Using PowerShell to write a file in UTF-8 without the BOM

CodeHunter

Decode or unescape \u00f0\u009f\u0091\u008d to 👍

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last