Problem json_encode utf-8 [duplicate]
json_encode()
is not actually outputting JSON* there. It’s outputting a javascript string. (It outputs JSON when you give it an object or an array to encode.) That’s fine, as a javascript string is what you want.
In javascript (and in JSON), č
may be escaped as \u010d
. The two are equivalent. So there’s nothing wrong with what json_encode()
is doing. It should work fine. I’d be very surprised if this is actually causing you any form of problem. However, if the transfer is safely in a Unicode encoding (UTF-8, usually)†, there’s no need for it either. If you want to turn off the escaping, you can do so thus: json_encode('Svrček', JSON_UNESCAPED_UNICODE)
. Note that the flag JSON_UNESCAPED_UNICODE
was introduced in PHP 5.4.0, and is unavailable in earlier versions.
By the way, contrary to what @onteria_ says, JSON does use UTF-8:
The character encoding of JSON text is always Unicode. UTF-8 is the only encoding that makes sense on the wire, but UTF-16 and UTF-32 are also permitted.
* Or, at least, it's not outputting JSON as defined in RFC 4627. However, there are other definitions of JSON, by which scalar values are allowed.
† JSON may be in UTF-8, UTF-16LE, UTF-16BE, UFT-32LE, or UTF-32BE.
Ok, so, after you make database connection in your php script, put this line, and it should work, at least it solved my problem:
mysql_query('SET CHARACTER SET utf8');
Yes, json_encode
escapes non-ascii characters. If you decode it you'll get your original result:
$string="こんにちは";echo "ENCODING: " . mb_detect_encoding($string) . "\n";$encoded = json_encode($string);echo "ENCODED JSON: $encoded\n";$decoded = json_decode($encoded);echo "DECODED JSON: $decoded\n";
Output:
ENCODING: UTF-8ENCODED JSON: "\u3053\u3093\u306b\u3061\u306f"DECODED JSON: こんにちは
EDIT: It's worth nothing that:
JSON uses Unicode exclusively.
The self-documenting format that describes structure and field names as well as specific values;
Source: http://www.json.org/fatfree.html
It uses Unicode NOT UTF-8. This FAQ Explains the difference between UTF-8 and Unicode:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
You use JSON, your non-ascii characters get escaped into Unicode code points. For example こ = code point 3053.