Python's glob module and unix' find command don't recognize non-ascii Python's glob module and unix' find command don't recognize non-ascii unix unix

Python's glob module and unix' find command don't recognize non-ascii


Mac OS X uses denormalized characters always for filenames on HFS+. Use unicodedata.normalize('NFD', pattern) to denormalize the glob pattern.

import unicodedataglob.glob(unicodedata.normalize('NFD', '*/Bärlauch*'))


Python programs are fundamentally text files. Conventionally, people write them using only characters from the ASCII character set, and thus do not have to think about the encoding they write them in: all character sets agree on how ASCII characters should be decoded.

You have written a Python program using a non-ASCII character. Your program thus comes with an implicit encoding (which you haven't mentioned): to save such a file, you have to decide how you are going to represent a-umlaut on disk. I would guess that perhaps your editor has chosen something non-Unicode for you.

Anyway, there are two ways around such a problem: either you can restrict yourself to using only ASCII characters in the source code of your program, or you can declare to Python that you want it to read the text file with a specific encoding.

To do the former, you should replace the a-umlaut with its Unicode escape sequence (which I think is \x0228 but can't test at the moment). To do the latter, you should add a coding declaration at the top of the file:

# -*- coding: <your encoding> -*-