How to replace/escape U+2028 or U+2029 characters in PHP to stop my JSONP API breaking How to replace/escape U+2028 or U+2029 characters in PHP to stop my JSONP API breaking php php

How to replace/escape U+2028 or U+2029 characters in PHP to stop my JSONP API breaking


You can replace U+2028, U+2029 with "\u2028", "\u2029" either on the PHP side or the JavaScript side, or both, it doesn't matter as long as it happens at least once (it's idempotent).

You can just use ordinary string replacement functions. They don't need to be "multibyte safe", and you can do it just as easily in any Unicode encoding (UTF-8, UTF-16, UTF-32 are all equally fine). PHP didn't have Unicode escape sequences last time I checked which is just one more reason why PHP is a joke but you can use the \x escape with UTF-8...

(In short, the reason there's no multibyte string replace function is because it would be redundant -- it would be exactly the same as a non-multibyte string replace function.)

// Javascriptdata = data.replace("\u2028", "\\u2028").replace("\u2029", "\\u2029");// PHP$data = str_replace("\xe2\x80\xa8", '\\u2028', $data);$data = str_replace("\xe2\x80\xa9", '\\u2029', $data);

Or you could just do nothing at all, since PHP escapes non-Unicode characters by default in json_encode():

// Safeecho json_encode("\xe2\x80\xa9");--> "\u2029"// Correct JSON, but invalid Javascript...// (Well, technically, JSON root must be array or object)echo json_encode("\xe2\x80\xa9", JSON_UNESCAPED_UNICODE);--> ""


It’s worth pointing out that this is no longer necessary.

By default, json_encode() encodes all non-ASCII characters (including U+2028 & U+2029), and also escapes the forward slash, even though that does not need to be escaped by the JSON spec. It does no harm to escape it, and it can be safer in certain contexts. So, by default, these characters are escaped anyway.

The JSON_UNESCAPED_UNICODE constant outputs unescaped Unicode, which can save bytes. However, just as the slash character is escaped because it can be dangerous in some contexts, so too U+2028 & U+2029 are also escaped, because they too are dangerous in some contexts. This was not the case at the time you asked your question: this feature has been added to PHP more recently.

(These extra escapes can be turned off with JSON_UNESCAPED_SLASHES and JSON_UNESCAPED_LINE_TERMINATORS, respectively.)