ConvertTo-Json and ConvertFrom-Json with special characters ConvertTo-Json and ConvertFrom-Json with special characters powershell powershell

ConvertTo-Json and ConvertFrom-Json with special characters


I decided to not use Unescape, instead replace the unicode \uxxxx characters with their string values and now it works properly:

$fileContent = @"{    "something":  "http://domain/?x=1&y=2",    "pattern":  "^(?!(\\`|\\~|\\!|\\@|\\#|\\$|\\||\\\\|\\'|\\\")).*"}"@$fileContent | ConvertFrom-Json | ConvertTo-Json | %{    [Regex]::Replace($_,         "\\u(?<Value>[a-zA-Z0-9]{4})", {            param($m) ([char]([int]::Parse($m.Groups['Value'].Value,                [System.Globalization.NumberStyles]::HexNumber))).ToString() } )}

Which generates the expected output:

{    "something":  "http://domain/?x=1&y=\\2",    "pattern":  "^(?!(\\|\\~|\\!|\\@|\\#|\\$|\\||\\\\|\\'|\\\")).*"}


If you don't want to rely on Regex (from @Reza Aghaei's answer), you could import the Newtonsoft JSON library. The benefit is the default StringEscapeHandling property which escapes control characters only. Another benefit is avoiding the potentially dangerous string replacements you would be doing with Regex.

This StringEscapeHandling is also the default handling of PowerShell Core (version 6 and up) because they started to use Newtonsoft internally since then. So another alternative would be to use ConvertFrom-Json and ConvertTo-Json from PowerShell Core.

Your code would look something like this if you import the Newtonsoft JSON library:

[Reflection.Assembly]::LoadFile("Newtonsoft.Json.dll")$json = Get-Content -Raw -Path file.json -Encoding UTF8 # read file$unescaped = [Newtonsoft.Json.Linq.JObject]::Parse($json) # similar to ConvertFrom-Json$escapedElementValue = [Newtonsoft.Json.JsonConvert]::ToString($unescaped.apiName.Value) # similar to ConvertTo-Json$escapedCompleteJson = [Newtonsoft.Json.JsonConvert]::SerializeObject($unescaped) # similar to ConvertTo-JsonWrite-Output "Variable passed = $escapedElementValue"Write-Output "Same JSON as Input = $escapedCompleteJson"


tl;dr

The problem does not affect PowerShell (Core) 6+ (the install-on-demand, cross-platform PowerShell edition), which uses a different implementation of the ConvertTo-Json and ConvertFrom-Json cmdlets, based on Newtonsoft.JSON (whose direct use is shown in r3verse's answer), as of Powershell 7.2. There, your sample roundtrip command works as expected.

Only ConvertTo-Json in Windows PowerShell is affected (the bundled-with-Windows PowerShell edition whose latest and final version is 5.1). But note that the JSON representation - while unexpected - is technically correct.

A simple, but robust solution focused only on unescaping those Unicode escape sequences that ConvertTo-Json unexpectedly creates - namely for & ' < > - while ruling out false positives:

# The following sample JSON with undesired Unicode escape sequences for `& < > '`, was# created with Windows PowerShell's ConvertTo-Json as follows:#   ConvertTo-Json "Ten o'clock at <night> & later. \u0027 \\u0027"# Note that \u0027 and \\u0027 are NOT Unicode escape sequences and must not be# interpreted as such.# The *desired* JSON representation - without the unexpected escaping - would be:#   "Ten o'clock at <night> & later. \\u0027 \\\\u0027"$json = '"Ten o\u0027clock at \u003Cnight\u003e \u0026 later. \\u0027 \\\\u0027"'[regex]::replace(  $json,   '(?<=(?:^|[^\\])(?:\\\\)*)\\u(00(?:26|27|3c|3e))',   { param($match) [char] [int] ('0x' + $match.Groups[1].Value) },  'IgnoreCase')

The above outputs the desired JSON representation, without the unnecessary escaping:

"Ten o'clock at <night> & later. \\u0027 \\\\u0027"

Background information:

ConvertTo-Json in Windows PowerShell unexpectedly represents the following ASCII-range characters by their Unicode escape sequences in JSON strings:

  • & (Unicode escape sequence: \u0026)
  • ' (\u0027)
  • < and > (\u003c and \u003e)

There's no good reason to do so (these characters only require escaping in HTML/XML text).

However, any compliant JSON parser - including ConvertFrom-Json - converts these escape sequences back to the characters they represent.

In other words: While the JSON text created by Windows PowerShell's ConvertTo-Json is unexpected and can impede readability, it is technically correct and - while not identical - equivalent to the original representation in terms of the data it represents.


Fixing the readability problem:

As an aside: While [regex]::Unescape(), whose purpose is to unescape regexes only, also converts Unicode escape sequences to the characters they represent, it is fundamentallyunsuited to selectively unescaping Unicode sequences JSON strings, given that all other \ escapes must be preserved in order for the JSON string to remain syntactically valid.

While your answer works well in general, it has limitations (aside from the easily corrected problem that a-zA-Z should be a-fA-F to limit matching to those letters that are valid hex. digits):

  • It doesn't rule out false positives, such as \\u0027 or \\\\u0027 (\\ escapes \, so that the u0027 part becomes a verbatim string and must not be treated as an escape sequence).

  • It converts all Unicode escape sequences, which presents two problems:

    • Escape sequences representing characters that require escaping would also be converted to the verbatim character representations, which would break the JSON representations with \u005c, for instance, given that the character it represents, \, requires escaping.

    • For non-BMP Unicode characters that must be represented as pairs of Unicode escape sequences (so-called surrogate pairs), your solution would mistakenly try to unescape each half of the pair separately.

For a robust solution that overcomes these limitations, see this answer(surrogate pairs are left as Unicode escape sequences, Unicode escape sequenceswhose characters require escaping are converted to \-based (C-style) escapes, such as \n, if possible).

However, if the only requirement is to unescape those Unicode escape sequencesthat Windows PowerShell's ConvertTo-Json unexpectedly creates, the solution at the top is sufficient.