Converting char[] to byte[] Converting char[] to byte[] arrays arrays

Converting char[] to byte[]


Convert without creating String object:

import java.nio.CharBuffer;import java.nio.ByteBuffer;import java.util.Arrays;byte[] toBytes(char[] chars) {  CharBuffer charBuffer = CharBuffer.wrap(chars);  ByteBuffer byteBuffer = Charset.forName("UTF-8").encode(charBuffer);  byte[] bytes = Arrays.copyOfRange(byteBuffer.array(),            byteBuffer.position(), byteBuffer.limit());  Arrays.fill(byteBuffer.array(), (byte) 0); // clear sensitive data  return bytes;}

Usage:

char[] chars = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};byte[] bytes = toBytes(chars);/* do something with chars/bytes */Arrays.fill(chars, '\u0000'); // clear sensitive dataArrays.fill(bytes, (byte) 0); // clear sensitive data

Solution is inspired from Swing recommendation to store passwords in char[]. (See Why is char[] preferred over String for passwords?)

Remember not to write sensitive data to logs and ensure that JVM won't hold any references to it.


The code above is correct but not effective. If you don't need performance but want security you can use it. If security also not a goal then do simply String.getBytes. Code above is not effective if you look down of implementation of encode in JDK. Besides you need to copy arrays and create buffers. Another way to convert is inline all code behind encode (example for UTF-8):

val xs: Array[Char] = "A ß € 嗨 𝄞 🙂".toArrayval len = xs.lengthval ys: Array[Byte] = new Array(3 * len) // worst casevar i = 0; var j = 0 // i for chars; j for byteswhile (i < len) { // fill ys with bytes  val c = xs(i)  if (c < 0x80) {    ys(j) = c.toByte    i = i + 1    j = j + 1  } else if (c < 0x800) {    ys(j) = (0xc0 | (c >> 6)).toByte    ys(j + 1) = (0x80 | (c & 0x3f)).toByte    i = i + 1    j = j + 2  } else if (Character.isHighSurrogate(c)) {    if (len - i < 2) throw new Exception("overflow")    val d = xs(i + 1)    val uc: Int =       if (Character.isLowSurrogate(d)) {        Character.toCodePoint(c, d)      } else {        throw new Exception("malformed")      }    ys(j) = (0xf0 | ((uc >> 18))).toByte    ys(j + 1) = (0x80 | ((uc >> 12) & 0x3f)).toByte    ys(j + 2) = (0x80 | ((uc >>  6) & 0x3f)).toByte    ys(j + 3) = (0x80 | (uc & 0x3f)).toByte    i = i + 2 // 2 chars    j = j + 4  } else if (Character.isLowSurrogate(c)) {    throw new Exception("malformed")  } else {    ys(j) = (0xe0 | (c >> 12)).toByte    ys(j + 1) = (0x80 | ((c >> 6) & 0x3f)).toByte    ys(j + 2) = (0x80 | (c & 0x3f)).toByte    i = i + 1    j = j + 3  }}// checkprintln(new String(ys, 0, j, "UTF-8"))

Excuse me for using Scala language. If you have problems with converting this code to Java I can rewrite it. What about performance always check on real data (with JMH for example). This code looks very similar to what you can see in JDK[2] and Protobuf[3].


char[] ch = ?new String(ch).getBytes();

or

new String(ch).getBytes("UTF-8");

to get non-default charset.

Update: Since Java 7: new String(ch).getBytes(StandardCharsets.UTF_8);


Edit: Andrey's answer has been updated so the following no longer applies.

Andrey's answer (the highest voted at the time of writing) is slightlyincorrect. I would have added this as comment but I am notreputable enough.

In Andrey's answer:

char[] chars = {'c', 'h', 'a', 'r', 's'}byte[] bytes = Charset.forName("UTF-8").encode(CharBuffer.wrap(chars)).array();

the call to array() may not return the desired value, for example:

char[] c = "aaaaaaaaaa".toCharArray();System.out.println(Arrays.toString(Charset.forName("UTF-8").encode(CharBuffer.wrap(c)).array()));

output:

[97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 0]

As can be seen a zero byte has been added. To avoid this use the following:

char[] c = "aaaaaaaaaa".toCharArray();ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(c));byte[] b = new byte[bb.remaining()];bb.get(b);System.out.println(Arrays.toString(b));

output:

[97, 97, 97, 97, 97, 97, 97, 97, 97, 97]

As the answer also alluded to using passwords it might be worthblanking out the array that backs the ByteBuffer (accessed via thearray() function):

ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(c));byte[] b = new byte[bb.remaining()];bb.get(b);blankOutByteArray(bb.array());System.out.println(Arrays.toString(b));