Java equivalent to JavaScript's encodeURIComponent that produces identical output? Java equivalent to JavaScript's encodeURIComponent that produces identical output? java java

Java equivalent to JavaScript's encodeURIComponent that produces identical output?


This is the class I came up with in the end:

import java.io.UnsupportedEncodingException;import java.net.URLDecoder;import java.net.URLEncoder;/** * Utility class for JavaScript compatible UTF-8 encoding and decoding. *  * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output * @author John Topley  */public class EncodingUtil{  /**   * Decodes the passed UTF-8 String using an algorithm that's compatible with   * JavaScript's <code>decodeURIComponent</code> function. Returns   * <code>null</code> if the String is <code>null</code>.   *   * @param s The UTF-8 encoded String to be decoded   * @return the decoded String   */  public static String decodeURIComponent(String s)  {    if (s == null)    {      return null;    }    String result = null;    try    {      result = URLDecoder.decode(s, "UTF-8");    }    // This exception should never occur.    catch (UnsupportedEncodingException e)    {      result = s;      }    return result;  }  /**   * Encodes the passed String as UTF-8 using an algorithm that's compatible   * with JavaScript's <code>encodeURIComponent</code> function. Returns   * <code>null</code> if the String is <code>null</code>.   *    * @param s The String to be encoded   * @return the encoded String   */  public static String encodeURIComponent(String s)  {    String result = null;    try    {      result = URLEncoder.encode(s, "UTF-8")                         .replaceAll("\\+", "%20")                         .replaceAll("\\%21", "!")                         .replaceAll("\\%27", "'")                         .replaceAll("\\%28", "(")                         .replaceAll("\\%29", ")")                         .replaceAll("\\%7E", "~");    }    // This exception should never occur.    catch (UnsupportedEncodingException e)    {      result = s;    }    return result;  }    /**   * Private constructor to prevent this class from being instantiated.   */  private EncodingUtil()  {    super();  }}


Looking at the implementation differences, I see that:

MDC on encodeURIComponent():

  • literal characters (regex representation): [-a-zA-Z0-9._*~'()!]

Java 1.5.0 documentation on URLEncoder:

  • literal characters (regex representation): [-a-zA-Z0-9._*]
  • the space character " " is converted into a plus sign "+".

So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8") and then do some post-processing:

  • replace all occurrences of "+" with "%20"
  • replace all occurrences of "%xx" representing any of [~'()!] back to their literal counter-parts


Using the javascript engine that is shipped with Java 6:

import javax.script.ScriptEngine;import javax.script.ScriptEngineManager;public class Wow{    public static void main(String[] args) throws Exception    {        ScriptEngineManager factory = new ScriptEngineManager();        ScriptEngine engine = factory.getEngineByName("JavaScript");        engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");    }}

Output: %22A%22%20B%20%c2%b1%20%22

The case is different but it's closer to what you want.