Java equivalent to JavaScript's encodeURIComponent that produces identical output?
This is the class I came up with in the end:
import java.io.UnsupportedEncodingException;import java.net.URLDecoder;import java.net.URLEncoder;/** * Utility class for JavaScript compatible UTF-8 encoding and decoding. * * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output * @author John Topley */public class EncodingUtil{ /** * Decodes the passed UTF-8 String using an algorithm that's compatible with * JavaScript's <code>decodeURIComponent</code> function. Returns * <code>null</code> if the String is <code>null</code>. * * @param s The UTF-8 encoded String to be decoded * @return the decoded String */ public static String decodeURIComponent(String s) { if (s == null) { return null; } String result = null; try { result = URLDecoder.decode(s, "UTF-8"); } // This exception should never occur. catch (UnsupportedEncodingException e) { result = s; } return result; } /** * Encodes the passed String as UTF-8 using an algorithm that's compatible * with JavaScript's <code>encodeURIComponent</code> function. Returns * <code>null</code> if the String is <code>null</code>. * * @param s The String to be encoded * @return the encoded String */ public static String encodeURIComponent(String s) { String result = null; try { result = URLEncoder.encode(s, "UTF-8") .replaceAll("\\+", "%20") .replaceAll("\\%21", "!") .replaceAll("\\%27", "'") .replaceAll("\\%28", "(") .replaceAll("\\%29", ")") .replaceAll("\\%7E", "~"); } // This exception should never occur. catch (UnsupportedEncodingException e) { result = s; } return result; } /** * Private constructor to prevent this class from being instantiated. */ private EncodingUtil() { super(); }}
Looking at the implementation differences, I see that:
- literal characters (regex representation):
[-a-zA-Z0-9._*~'()!]
Java 1.5.0 documentation on URLEncoder
:
- literal characters (regex representation):
[-a-zA-Z0-9._*]
- the space character
" "
is converted into a plus sign"+"
.
So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8")
and then do some post-processing:
- replace all occurrences of
"+"
with"%20"
- replace all occurrences of
"%xx"
representing any of[~'()!]
back to their literal counter-parts
Using the javascript engine that is shipped with Java 6:
import javax.script.ScriptEngine;import javax.script.ScriptEngineManager;public class Wow{ public static void main(String[] args) throws Exception { ScriptEngineManager factory = new ScriptEngineManager(); ScriptEngine engine = factory.getEngineByName("JavaScript"); engine.eval("print(encodeURIComponent('\"A\" B ± \"'))"); }}
Output: %22A%22%20B%20%c2%b1%20%22
The case is different but it's closer to what you want.