Encode String to UTF-8
String
objects in Java use the UTF-16 encoding that can't be modified.
The only thing that can have a different encoding is a byte[]
. So if you need UTF-8 data, then you need a byte[]
. If you have a String
that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String
(i.e. it was using the wrong encoding).
In Java7 you can use:
import static java.nio.charset.StandardCharsets.*;byte[] ptext = myString.getBytes(ISO_8859_1); String value = new String(ptext, UTF_8);
This has the advantage over getBytes(String)
that it does not declare throws UnsupportedEncodingException
.
If you're using an older Java version you can declare the charset constants yourself:
import java.nio.charset.Charset;public class StandardCharsets { public static final Charset ISO_8859_1 = Charset.forName("ISO-8859-1"); public static final Charset UTF_8 = Charset.forName("UTF-8"); //....}