python, windows : parsing command lines with shlex python, windows : parsing command lines with shlex windows windows

python, windows : parsing command lines with shlex


There is no valid command-line splitting function so far in the Python stdlib for Windows/multi-platform so far. (Mar 2016)

subprocess

So in short for subprocess.Popen .call etc. best do like:

if sys.platform == 'win32':    args = cmdelse:    args = shlex.split(cmd)subprocess.Popen(args, ...)

On Windows the split is not necessary for either values of shell option and internally Popen just uses subprocess.list2cmdline to again re-join the split arguments :-) .

With option shell=True the shlex.split is not necessary on Unix either.

Split or not, on Windows for starting .bat or .cmd scripts (unlike .exe .com) you need to include the file extension explicitely - unless shell=True.

Notes on command-line splitting nonetheless:

shlex.split(cmd, posix=0) retains backslashes in Windows paths, but it doesn't understand quoting & escaping right. Its not very clear what the posix=0 mode of shlex is good for at all - but 99% it certainly seduces Windows/cross-platform programmers ...

Windows API exposes ctypes.windll.shell32.CommandLineToArgvW:

Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values.

def win_CommandLineToArgvW(cmd):    import ctypes    nargs = ctypes.c_int()    ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p)    lpargs = ctypes.windll.shell32.CommandLineToArgvW(unicode(cmd), ctypes.byref(nargs))    args = [lpargs[i] for i in range(nargs.value)]    if ctypes.windll.kernel32.LocalFree(lpargs):        raise AssertionError    return args

However that function CommandLineToArgvW is bogus - or just weakly similar to the mandatory standard C argv & argc parsing:

>>> win_CommandLineToArgvW('aaa"bbb""" ccc')[u'aaa"bbb"""', u'ccc']>>> win_CommandLineToArgvW('""  aaa"bbb""" ccc')[u'', u'aaabbb" ccc']>>> 
C:\scratch>python -c "import sys;print(sys.argv)" aaa"bbb""" ccc['-c', 'aaabbb"', 'ccc']C:\scratch>python -c "import sys;print(sys.argv)" ""  aaa"bbb""" ccc['-c', '', 'aaabbb"', 'ccc']

Watch http://bugs.python.org/issue1724822 for possibly future additions in the Python lib. (The mentioned function on "fisheye3" server doesn't really work correct.)


Cross-platform candidate function

Valid Windows command-line splitting is rather crazy. E.g. try \ \\ \" \\"" \\\"aaa """" ...

My current candidate function for cross-platform command-line splitting is the following function which I consider for Python lib proposal. Its multi-platform; its ~10x faster than shlex, which does single-char stepping and streaming; and also respects pipe-related characters (unlike shlex). It stands a list of tough real-shell-tests already on Windows & Linux bash, plus the legacy posix test patterns of test_shlex.Interested in feedback about remaining bugs.

def cmdline_split(s, platform='this'):    """Multi-platform variant of shlex.split() for command-line splitting.    For use with subprocess, for argv injection etc. Using fast REGEX.    platform: 'this' = auto from current platform;              1 = POSIX;               0 = Windows/CMD              (other values reserved)    """    if platform == 'this':        platform = (sys.platform != 'win32')    if platform == 1:        RE_CMD_LEX = r'''"((?:\\["\\]|[^"])*)"|'([^']*)'|(\\.)|(&&?|\|\|?|\d?\>|[<])|([^\s'"\\&|<>]+)|(\s+)|(.)'''    elif platform == 0:        RE_CMD_LEX = r'''"((?:""|\\["\\]|[^"])*)"?()|(\\\\(?=\\*")|\\")|(&&?|\|\|?|\d?>|[<])|([^\s"&|<>]+)|(\s+)|(.)'''    else:        raise AssertionError('unkown platform %r' % platform)    args = []    accu = None   # collects pieces of one arg    for qs, qss, esc, pipe, word, white, fail in re.findall(RE_CMD_LEX, s):        if word:            pass   # most frequent        elif esc:            word = esc[1]        elif white or pipe:            if accu is not None:                args.append(accu)            if pipe:                args.append(pipe)            accu = None            continue        elif fail:            raise ValueError("invalid or incomplete shell string")        elif qs:            word = qs.replace('\\"', '"').replace('\\\\', '\\')            if platform == 0:                word = word.replace('""', '"')        else:            word = qss   # may be even empty; must be last        accu = (accu or '') + word    if accu is not None:        args.append(accu)    return args