Java - Count exactly 60 characters from a string with a mixture of UTF-8 and non UTF-8 characters

java string oracle encoding character-encoding

As far as I understand you want to limit the String length in a way that the encoded UTF-8 representation does not exceed 60 bytes. You can do it this way:

String s=…;CharsetEncoder enc=StandardCharsets.UTF_8.newEncoder();ByteBuffer bb=ByteBuffer.allocate(60);// note the limitCharBuffer cb = CharBuffer.wrap(s);CoderResult r = enc.encode(cb, bb, true);if(r.isOverflow()) {    System.out.println(s+" is too long for "                      +bb.capacity()+" "+enc.charset()+" bytes");    s=cb.flip().toString();    System.out.println("truncated to "+s);}

java string oracle encoding character-encoding

This is my quick hack: a function to truncate a string to given number of bytes in UTF-8 encoding:

public static String truncateUtf8(String original, int byteCount) {    if (original.length() * 3 <= byteCount) {        return original;    }    StringBuilder sb = new StringBuilder();    int count = 0;    for (int i = 0; i < original.length(); i++) {        char c = original.charAt(i);        int newCount;        if (c <= 0x7f) newCount = count + 1;        else if (c <= 0x7ff) newCount = count + 2;        else newCount = count + 3;        if (newCount > byteCount) {            break;        }        count = newCount;        sb.append(c);    }    return sb.toString();}

It does not work as expected for characters outside of BMP – counts them as 6 bytes instead of 4. It may also break grapheme clusters. But for most simple tasks it should be OK.

truncateUtf8("e", 1) => "e"truncateUtf8("ée", 1) => ""truncateUtf8("ée", 2) => "é"truncateUtf8("ée", 3) => "ée"

CodeHunter

Java - Count exactly 60 characters from a string with a mixture of UTF-8 and non UTF-8 characters

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last