Cross-platform, safe to use command line string separator Cross-platform, safe to use command line string separator shell shell

Cross-platform, safe to use command line string separator


The best solution for us was using a platform depependent separator:

  • Windows: ;

  • Unix: :

A bit tricky to document, but a clean and safe solution.


The real problem and its solution

Your question is an instance of XY problem to some extent. A red herring at least.

As I show below, no ideal path delimiters exist, and therefore you have to pass that information in separate command-line options, if you really insist on supporting arbitrarily crazy paths. It is up to the users, then, to escape their weird characters in paths when calling your program.

No ideal path delimiters exists

Unix paths can contain any characters except ASCII NUL (\0). Path components (file names) are not allowed to contain slash (/). Anything else is OK, according to POSIX.

Therefore, you picked too tight constraints. No ideal solution to your problem exists even on Unix, completely ignoring the portability issue.

Good path delimiters

You have to put some “common sense” constraints on paths, e.g. that they will not contain semicolon on Windows and colon on Unix. This combination is quite natural, intuitive and easy to read, by the way, because these characters are path separators for these systems.

Let’s find if you can reserve just one character that may never occur in a path. Will the set of constraints be satisfiable then?

If you list non-alphanumeric printable ASCII characters and remove those with special meaning for Unix shell and those used in paths even by sane people (_, -, etc.), you can pick a reasonable path delimiter:

LC_ALL=Cawk 'BEGIN{ for (i=1;i<ARGC;i++) printf "%c\n", ARGV[i]; }' {1..127} |    grep '^[[:print:]]$' |    grep '^[^][*?~$`"'\''&|#\<>(){}!;/[:alnum:] ._-]$'

ASCII is 0..127, but 0 is excluded as it causes trouble with the text-oriented utilities. Bash specials are filtered out, too.

The resulting set contains just seven characters, though: %+,:=@^

Aaah, percent (%) and caret (^) unfortunately have special meaning in cmd.exe and colon (:) in Windows paths. Only four remaining: +,=@

Either you pick one of those, or you assume they are inconvenient and you revise the list of specials to pick different character for different systems (e.g. the colon and semicolon compromise you have suggested), which relaxes the portability constraint a bit. Or maybe the tilde (~) is not that special in shell as it is expanded to home directory path only at shell word start. Or maybe you do not want a separator character, but separator string – you can guess that very few files have @@@ in their names.