dotnet core System.Text.Json unescape unicode string dotnet core System.Text.Json unescape unicode string json json

dotnet core System.Text.Json unescape unicode string


You need to set the JsonSerializer options not to encode those strings.

JsonSerializerOptions jso = new JsonSerializerOptions();jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

Then you pass this options when you call your Serialize method.

var s = JsonSerializer.Serialize(a, jso);        

Full code:

JsonSerializerOptions jso = new JsonSerializerOptions();jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;var a = new A { Name = "你好" };var s = JsonSerializer.Serialize(a, jso);        Console.WriteLine(s);

Result:

enter image description here

If you need to print the result in the console, you may need to install additional language. Please refer here.


To change the escaping behavior of the JsonSerializer you can pass in a custom JavascriptEncoder to the JsonSerializer by setting the Encoder property on the JsonSerializerOptions.

https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.encoder?view=netcore-3.0#System_Text_Json_JsonSerializerOptions_Encoder

The default behavior is designed with security in mind and the JsonSerializer over-escapes for defense-in-depth.

If all you are looking for is escaping certain "alphanumeric" characters of a specific non-latin language, I would recommend that you instead create a JavascriptEncoder using the Create factory method rather than using the UnsafeRelaxedJsonEscaping encoder.

JsonSerializerOptions options = new JsonSerializerOptions{    Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs)};var a = new A { Name = "你好" };var s = JsonSerializer.Serialize(a, options);Console.WriteLine(s);

Doing so keeps certain safe-guards, for instance, HTML-sensitive characters will continue to be escaped.

I would caution against using System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping flippantly since it does minimal escaping (which is why it has "unsafe" in the name). If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this.

See the remarks section within the API docs:https://docs.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0#remarks

You could also consider specifying UnicodeRanges.All if you expect/need all languages to remain un-escaped. This still escapes certain ASCII characters that are prone to security vulnerabilities.

JsonSerializerOptions options = new JsonSerializerOptions{    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)};

For more information and code samples, see: https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#customize-character-encoding

See the Caution Note


You can use: System.Text.RegularExpressions.Regex.Unescape(string) to unescape the unicode characters.https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape

Updating example from original question:

using System;using System.Text.Json;public class Program{    public static void Main()    {            var a = new A{Name = "你好"};            var s = JsonSerializer.Serialize(a);                    var unescaped = System.Text.RegularExpressions.Regex.Unescape(s);            Console.WriteLine(s);            Console.WriteLine(unescaped);        }}class A {    public string Name {get; set;}}

Output:

{"Name":"\u4F60\u597D"}{"Name":"你好"}