Java equivalent to JavaScript's encodeURIComponent that produces identical output?

This is the class I came up with in the end:

import java.io.UnsupportedEncodingException;import java.net.URLDecoder;import java.net.URLEncoder;/** * Utility class for JavaScript compatible UTF-8 encoding and decoding. *  * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output * @author John Topley  */public class EncodingUtil{  /**   * Decodes the passed UTF-8 String using an algorithm that's compatible with   * JavaScript's <code>decodeURIComponent</code> function. Returns   * <code>null</code> if the String is <code>null</code>.   *   * @param s The UTF-8 encoded String to be decoded   * @return the decoded String   */  public static String decodeURIComponent(String s)  {    if (s == null)    {      return null;    }    String result = null;    try    {      result = URLDecoder.decode(s, "UTF-8");    }    // This exception should never occur.    catch (UnsupportedEncodingException e)    {      result = s;      }    return result;  }  /**   * Encodes the passed String as UTF-8 using an algorithm that's compatible   * with JavaScript's <code>encodeURIComponent</code> function. Returns   * <code>null</code> if the String is <code>null</code>.   *    * @param s The String to be encoded   * @return the encoded String   */  public static String encodeURIComponent(String s)  {    String result = null;    try    {      result = URLEncoder.encode(s, "UTF-8")                         .replaceAll("\\+", "%20")                         .replaceAll("\\%21", "!")                         .replaceAll("\\%27", "'")                         .replaceAll("\\%28", "(")                         .replaceAll("\\%29", ")")                         .replaceAll("\\%7E", "~");    }    // This exception should never occur.    catch (UnsupportedEncodingException e)    {      result = s;    }    return result;  }    /**   * Private constructor to prevent this class from being instantiated.   */  private EncodingUtil()  {    super();  }}

java javascript unicode utf-8

Looking at the implementation differences, I see that:

MDC on encodeURIComponent():

literal characters (regex representation): [-a-zA-Z0-9._*~'()!]

Java 1.5.0 documentation on URLEncoder:

literal characters (regex representation): [-a-zA-Z0-9._*]
the space character " " is converted into a plus sign "+".

So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8") and then do some post-processing:

replace all occurrences of "+" with "%20"
replace all occurrences of "%xx" representing any of [~'()!] back to their literal counter-parts

java javascript unicode utf-8

Using the javascript engine that is shipped with Java 6:

import javax.script.ScriptEngine;import javax.script.ScriptEngineManager;public class Wow{    public static void main(String[] args) throws Exception    {        ScriptEngineManager factory = new ScriptEngineManager();        ScriptEngine engine = factory.getEngineByName("JavaScript");        engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");    }}

Output: %22A%22%20B%20%c2%b1%20%22

The case is different but it's closer to what you want.

CodeHunter

Java equivalent to JavaScript's encodeURIComponent that produces identical output?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last