shlex alternative for Java
I had a similar problem today, and it didn't look like any standard options such as StringTokenizer, StrTokenizer, Scanner were a good fit. However, it's not hard to implement the basics.
This example handles all the edge cases currently commented on other answers. Be warned, I haven't checked it for full POSIX compliance yet. Gist including unit tests available on GitHub - released in public domain via the unlicense.
public List<String> shellSplit(CharSequence string) { List<String> tokens = new ArrayList<String>(); boolean escaping = false; char quoteChar = ' '; boolean quoting = false; int lastCloseQuoteIndex = Integer.MIN_VALUE; StringBuilder current = new StringBuilder(); for (int i = 0; i<string.length(); i++) { char c = string.charAt(i); if (escaping) { current.append(c); escaping = false; } else if (c == '\\' && !(quoting && quoteChar == '\'')) { escaping = true; } else if (quoting && c == quoteChar) { quoting = false; lastCloseQuoteIndex = i; } else if (!quoting && (c == '\'' || c == '"')) { quoting = true; quoteChar = c; } else if (!quoting && Character.isWhitespace(c)) { if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) { tokens.add(current.toString()); current = new StringBuilder(); } } else { current.append(c); } } if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) { tokens.add(current.toString()); } return tokens;}
Look at Apache Commons Lang:
org.apache.commons.lang.text.StrTokenizer should be able to do what you want:
new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();
I had success using the following Scala code using fastparse. I can't vouch for it being complete:
val kvParser = { import fastparse._ import NoWhitespace._ def nonQuoteChar[_:P] = P(CharPred(_ != '"')) def quotedQuote[_:P] = P("\\\"") def quotedElement[_:P] = P(nonQuoteChar | quotedQuote) def quotedContent[_:P] = P(quotedElement.rep) def quotedString[_:P] = P("\"" ~/ quotedContent.! ~ "\"") def alpha[_:P] = P(CharIn("a-zA-Z")) def digit[_:P] = P(CharIn("0-9")) def hyphen[_:P] = P("-") def underscore[_:P] = P("_") def bareStringChar[_:P] = P(alpha | digit | hyphen | underscore) def bareString[_:P] = P(bareStringChar.rep.!) def string[_:P] = P(quotedString | bareString) def kvPair[_:P] = P(string ~ "=" ~ string) def commaAndSpace[_:P] = P(CharIn(" \t\n\r").rep ~ "," ~ CharIn(" \t\n\r").rep) def kvPairList[_:P] = P(kvPair.rep(sep = commaAndSpace)) def fullLang[_:P] = P(kvPairList ~ End) def res(str: String) = { parse(str, fullLang(_)) } res _}