How to split a string, but also keep the delimiters? How to split a string, but also keep the delimiters? java java

How to split a string, but also keep the delimiters?


You can use Lookahead and Lookbehind. Like this:

System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));

And you will get:

[a;, b;, c;, d][a, ;b, ;c, ;d][a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

Hope this helps.

EDIT Fabian Steeg comments on Readability is valid. Readability is always the problem for RegEx. One thing, I do to help easing this is to create a variable whose name represent what the regex does and use Java String format to help that. Like this:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";...public void someMethod() {...final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));...}...

This helps a little bit. :-D


You want to use lookarounds, and split on zero-width matches. Here are some examples:

public class SplitNDump {    static void dump(String[] arr) {        for (String s : arr) {            System.out.format("[%s]", s);        }        System.out.println();    }    public static void main(String[] args) {        dump("1,234,567,890".split(","));        // "[1][234][567][890]"        dump("1,234,567,890".split("(?=,)"));           // "[1][,234][,567][,890]"        dump("1,234,567,890".split("(?<=,)"));          // "[1,][234,][567,][890]"        dump("1,234,567,890".split("(?<=,)|(?=,)"));        // "[1][,][234][,][567][,][890]"        dump(":a:bb::c:".split("(?=:)|(?<=:)"));        // "[][:][a][:][bb][:][:][c][:]"        dump(":a:bb::c:".split("(?=(?!^):)|(?<=:)"));        // "[:][a][:][bb][:][:][c][:]"        dump(":::a::::b  b::c:".split("(?=(?!^):)(?<!:)|(?!:)(?<=:)"));        // "[:::][a][::::][b  b][::][c][:]"        dump("a,bb:::c  d..e".split("(?!^)\\b"));        // "[a][,][bb][:::][c][  ][d][..][e]"        dump("ArrayIndexOutOfBoundsException".split("(?<=[a-z])(?=[A-Z])"));        // "[Array][Index][Out][Of][Bounds][Exception]"        dump("1234567890".split("(?<=\\G.{4})"));           // "[1234][5678][90]"        // Split at the end of each run of letter        dump("Boooyaaaah! Yippieeee!!".split("(?<=(?=(.)\\1(?!\\1))..)"));        // "[Booo][yaaaa][h! Yipp][ieeee][!!]"    }}

And yes, that is triply-nested assertion there in the last pattern.

Related questions

See also


A very naive solution, that doesn't involve regex would be to perform a string replace on your delimiter along the lines of (assuming comma for delimiter):

string.replace(FullString, "," , "~,~")

Where you can replace tilda (~) with an appropriate unique delimiter.

Then if you do a split on your new delimiter then i believe you will get the desired result.