Convert ASCII byte[] to String Convert ASCII byte[] to String arrays arrays

Convert ASCII byte[] to String


ASCII is one of the few encodings that can be converted to/from UTF16 with no arithmetic or table lookups so it's possible to convert manually:

String convert(byte[] data) {    StringBuilder sb = new StringBuilder(data.length);    for (int i = 0; i < data.length; ++ i) {        if (data[i] < 0) throw new IllegalArgumentException();        sb.append((char) data[i]);    }    return sb.toString();}

But make sure it really is ASCII, or you'll end up with garbage.


What you want to do is delay processing of the byte[] array until log4j decides that it actually wants to log the message. This way you could log it at DEBUG level, for example, while testing and then disable it during production. For example, you could:

final byte[] myArray = ...;Logger.getLogger(MyClass.class).debug(new Object() {    @Override public String toString() {        return new String(myArray);    }});

Now you don't pay the speed penalty unless you actually log the data, because the toString method isn't called until log4j decides it'll actually log the message!

Now I'm not sure what you mean by "the obvious representation" so I've assumed that you mean convert to a String by reinterpreting the bytes as the default character encoding. Now if you are dealing with binary data, this is obviously worthless. In that case I'd suggest using Arrays.toString(byte[]) to create a formatted string along the lines of

[54, 23, 65, ...]


If your data is in fact ASCII (i.e. 7-bit data), then you should be using new String(data, "US-ASCII") instead of depending on the platform default encoding. This may be faster than trying to interpret it as your platform default encoding (which could be UTF-8, which requires more introspection).

You could also speed this up by avoiding the Charset-Lookup hit each time, by caching the Charset instance and calling new String(data, charset) instead.

Having said that: it's been a very, very long time since I've seen real ASCII data in production environment