What character sequence should I not allow in a filename? What character sequence should I not allow in a filename? linux linux

What character sequence should I not allow in a filename?


Your question is somewhat confusing since you talk at length about Linux, but then in a comment to another answer you say that you are generating filenames for people to download, which presumably means that you have absolutely no control whatsoever over the filesystem and operating system that the files will be stored on, making Linux completely irrelevant.

For the purpose of this answer I'm going to assume that your question is wrong and your comment is correct.

The vast majority of operating systems and filesystems in use today fall roughly into three categories: POSIX, Windows and MacOS.

The POSIX specification is very clear on what a filename that is guaranteed to be portable across all POSIX systems looks like. The characters that you can use are defined in Section 3.276 (Portable Filename Character Set) of the Open Group Base Specification as:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._-
The maximum filename length that you can rely on is defined in Section 13.23.3.5 (<limits.h> Minimum Values) as 14. (The relevant constant is _POSIX_NAME_MAX.)

So, a filename which is up to 14 characters long and contains only the 65 characters listed above, is safe to use on all POSIX compliant systems, which gives you 24407335764928225040435790 combinations (or roughly 84 bits).

If you don't want to annoy your users, you should add two more restrictions: don't start the filename with a dash or a dot. Filenames starting with a dot are customarily interpreted as "hidden" files and are not displayed in directory listings unless explicitly requested. And filenames starting with a dash may be interpreted as an option by many commands. (Sidenote: it is amazing how many users don't know about the rm ./-rf or rm -- -rf tricks.)

This leaves you at 23656340818315048885345458 combinations (still 84 bits).

Windows adds a couple of new restrictions to this: filenames cannot end with a dot and filenames are case-insensitive. This reduces the character set from 65 to 39 characters (37 for the first, 38 for the last character). It doesn't add any length restrictions, Windows can deal with 14 characters just fine.

This reduces the possible combinations to 17866587696996781449603 (73 bits).

Another restriction is that Windows treats everything after the last dot as a filename extension which denotes the type of the file. If you want to avoid potential confusion (say, if you generate a filename like abc.mp3 for a text file), you should avoid dots altogether.

You still have 13090925539866773438463 combinations (73 bits).

If you have to worry about DOS, then additional restrictions apply: the filename consists of one or two parts (seperated by a dot), where neither of the two parts can contain a dot. The first part has a maximum length of 8, the second of 3 characters. Again, the second part is usually reserved to indicate the file type, which leaves you only 8 characters.

Now you have 4347792138495 possible filenames or 41 bits.

The good news is that you can use the 3 character extension to actually correctly indicate the file type, without breaking the POSIX filename limit (8+3+1 = 12 < 14).

If you want your users to be able to burn the files onto a CD-R formatted with ISO9660 Level 1, then you have to disallow hyphen anywhere, not just as the first character. Now, the remaining character set looks like

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
which gives you 3512479453921 combinations (41 bits).


I would leave the determination of what's "valid" up to the OS and filesystem driver. Let the user type whatever they want, and pass it on. Handle errors from the OS in an appropriate manner. The exception is I think it's reasonable to strip leading and trailing spaces. If people want to create filenames with embedded spaces or leading dashes or question marks, and their chosen filesystem allows it, it shouldn't be up to you to try to prevent them.

It's possible to mount different filesystems at different mount points (or drives in Windows) that have different rules regarding legal characters in a file name. Handling this sort of thing inside your application will be much more work than is necessary, because the OS will already do it for you.


Since you seem to be interested primarily in Linux, one thing to avoid is characters that the (typical) shell will try to interpret, for example, as a wildcard. You can create a file named "*" if you insist, but you might have some users who don't appreciate it much.