Python regex matching Unicode properties
Have you tried Ponyguruma, a Python binding to the Oniguruma regular expression engine? In that engine you can simply say \p{Armenian}
to match Armenian characters. \p{Ll}
or \p{Zs}
work too.
You can painstakingly use unicodedata on each character:
import unicodedatadef strip_accents(x): return u''.join(c for c in unicodedata.normalize('NFD', x) if unicodedata.category(c) != 'Mn')