shlex alternative for Java shlex alternative for Java bash bash

shlex alternative for Java


I had a similar problem today, and it didn't look like any standard options such as StringTokenizer, StrTokenizer, Scanner were a good fit. However, it's not hard to implement the basics.

This example handles all the edge cases currently commented on other answers. Be warned, I haven't checked it for full POSIX compliance yet. Gist including unit tests available on GitHub - released in public domain via the unlicense.

public List<String> shellSplit(CharSequence string) {    List<String> tokens = new ArrayList<String>();    boolean escaping = false;    char quoteChar = ' ';    boolean quoting = false;    int lastCloseQuoteIndex = Integer.MIN_VALUE;    StringBuilder current = new StringBuilder();    for (int i = 0; i<string.length(); i++) {        char c = string.charAt(i);        if (escaping) {            current.append(c);            escaping = false;        } else if (c == '\\' && !(quoting && quoteChar == '\'')) {            escaping = true;        } else if (quoting && c == quoteChar) {            quoting = false;            lastCloseQuoteIndex = i;        } else if (!quoting && (c == '\'' || c == '"')) {            quoting = true;            quoteChar = c;        } else if (!quoting && Character.isWhitespace(c)) {            if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {                tokens.add(current.toString());                current = new StringBuilder();            }        } else {            current.append(c);        }    }    if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {        tokens.add(current.toString());    }    return tokens;}


Look at Apache Commons Lang:

org.apache.commons.lang.text.StrTokenizer should be able to do what you want:

new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();


I had success using the following Scala code using fastparse. I can't vouch for it being complete:

val kvParser = {  import fastparse._  import NoWhitespace._  def nonQuoteChar[_:P] = P(CharPred(_ != '"'))  def quotedQuote[_:P] = P("\\\"")  def quotedElement[_:P] = P(nonQuoteChar | quotedQuote)  def quotedContent[_:P] = P(quotedElement.rep)  def quotedString[_:P] = P("\"" ~/ quotedContent.! ~ "\"")  def alpha[_:P] = P(CharIn("a-zA-Z"))  def digit[_:P] = P(CharIn("0-9"))  def hyphen[_:P] = P("-")  def underscore[_:P] = P("_")  def bareStringChar[_:P] = P(alpha | digit | hyphen | underscore)  def bareString[_:P] = P(bareStringChar.rep.!)  def string[_:P] = P(quotedString | bareString)  def kvPair[_:P] = P(string ~ "=" ~ string)  def commaAndSpace[_:P] = P(CharIn(" \t\n\r").rep ~ "," ~ CharIn(" \t\n\r").rep)  def kvPairList[_:P] = P(kvPair.rep(sep = commaAndSpace))  def fullLang[_:P] = P(kvPairList ~ End)  def res(str: String) = {    parse(str, fullLang(_))  }  res _}