Converting bytes[] to string in HBase Converting bytes[] to string in HBase hadoop hadoop

Converting bytes[] to string in HBase


The standard HBase way of string conversion is Bytes.toBytes(string) and Bytes.toString(bytes). But Jon Skeet is correct in that you need to consider how you put the data into the column in the first place. If you used Bytes.toBytes(int), then you need to convert your bytes back into an integer before you convert to a string.


We have simply used new String(byte[]), where byte[] comes from org.apache.hadoop.hbase.KeyValue.getValue() to parse the bytes from HBase column as string and it working fine for our projects. :)Sorry, if I missed something in the question. Hope this helps.


Firstly, I'd avoid using String.getBytes() without specifying an encoding. What encoding does the code actually expect? Specify it explicitly when you call "DIE".getBytes() and "ID".getBytes().

Next, it looks like you should be converting the 4 bytes into an integer first - then convert that integer into a string. For example:

byte[] valueAsBytes = ...;int valueAsInt = ((valueAsBytes[0] & 0xff) << 24) |                 ((valueAsBytes[1] & 0xff) << 16) |                 ((valueAsBytes[2] & 0xff) << 8) |                 (valueAsBytes[3] & 0xff);String valueAsString = String.valueof(valueAsInt);

There may well be something in the Java API to do the bit manipulation directly, but I can't think of it right now. (There's DataInputStream, but that would require wrapping the byte array in a ByteArrayInputStream first, then you'd need to check the endianness...)

Your current code is doing exactly what you ask it to - admittedly with the default encoding of the platform. You've got "\u0000\u0000\u0000\u0001" basically.