Split string to equal length substrings in Java Split string to equal length substrings in Java java java

Split string to equal length substrings in Java


Here's the regex one-liner version:

System.out.println(Arrays.toString(    "Thequickbrownfoxjumps".split("(?<=\\G.{4})")));

\G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A. The enclosing lookbehind matches the position that's four characters along from the end of the last match.

Both lookbehind and \G are advanced regex features, not supported by all flavors. Furthermore, \G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \G, and couldn't be used this way even if JS did support lookbehind.

I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

Also, this doesn't work in Android, which doesn't support the use of \G in lookbehinds.


Well, it's fairly easy to do this with simple arithmetic and string operations:

public static List<String> splitEqually(String text, int size) {    // Give the list the right capacity to start with. You could use an array    // instead if you wanted.    List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);    for (int start = 0; start < text.length(); start += size) {        ret.add(text.substring(start, Math.min(text.length(), start + size)));    }    return ret;}

Note: this assumes a 1:1 mapping of UTF-16 code unit (char, effectively) with "character". That assumption breaks down for characters outside the Basic Multilingual Plane, such as emoji, and (depending on how you want to count things) combining characters.

I don't think it's really worth using a regex for this.

EDIT: My reasoning for not using a regex:

  • This doesn't use any of the real pattern matching of regexes. It's just counting.
  • I suspect the above will be more efficient, although in most cases it won't matter
  • If you need to use variable sizes in different places, you've either got repetition or a helper function to build the regex itself based on a parameter - ick.
  • The regex provided in another answer firstly didn't compile (invalid escaping), and then didn't work. My code worked first time. That's more a testament to the usability of regexes vs plain code, IMO.


This is very easy with Google Guava:

for(final String token :    Splitter        .fixedLength(4)        .split("Thequickbrownfoxjumps")){    System.out.println(token);}

Output:

Thequickbrownfoxjumps

Or if you need the result as an array, you can use this code:

String[] tokens =    Iterables.toArray(        Splitter            .fixedLength(4)            .split("Thequickbrownfoxjumps"),        String.class    );

Reference:

Note: Splitter construction is shown inline above, but since Splitters are immutable and reusable, it's a good practice to store them in constants:

private static final Splitter FOUR_LETTERS = Splitter.fixedLength(4);// more codefor(final String token : FOUR_LETTERS.split("Thequickbrownfoxjumps")){    System.out.println(token);}