In Python, how to check if a string only contains certain characters? In Python, how to check if a string only contains certain characters? python python

In Python, how to check if a string only contains certain characters?


Here's a simple, pure-Python implementation. It should be used when performance is not critical (included for future Googlers).

import stringallowed = set(string.ascii_lowercase + string.digits + '.')def check(test_str):    set(test_str) <= allowed

Regarding performance, iteration will probably be the fastest method. Regexes have to iterate through a state machine, and the set equality solution has to build a temporary set. However, the difference is unlikely to matter much. If performance of this function is very important, write it as a C extension module with a switch statement (which will be compiled to a jump table).

Here's a C implementation, which uses if statements due to space constraints. If you absolutely need the tiny bit of extra speed, write out the switch-case. In my tests, it performs very well (2 seconds vs 9 seconds in benchmarks against the regex).

#define PY_SSIZE_T_CLEAN#include <Python.h>static PyObject *check(PyObject *self, PyObject *args){        const char *s;        Py_ssize_t count, ii;        char c;        if (0 == PyArg_ParseTuple (args, "s#", &s, &count)) {                return NULL;        }        for (ii = 0; ii < count; ii++) {                c = s[ii];                if ((c < '0' && c != '.') || c > 'z') {                        Py_RETURN_FALSE;                }                if (c > '9' && c < 'a') {                        Py_RETURN_FALSE;                }        }        Py_RETURN_TRUE;}PyDoc_STRVAR (DOC, "Fast stringcheck");static PyMethodDef PROCEDURES[] = {        {"check", (PyCFunction) (check), METH_VARARGS, NULL},        {NULL, NULL}};PyMODINIT_FUNCinitstringcheck (void) {        Py_InitModule3 ("stringcheck", PROCEDURES, DOC);}

Include it in your setup.py:

from distutils.core import setup, Extensionext_modules = [    Extension ('stringcheck', ['stringcheck.c']),],

Use as:

>>> from stringcheck import check>>> check("abc")True>>> check("ABC")False


Final(?) edit

Answer, wrapped up in a function, with annotated interactive session:

>>> import re>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):...     return not bool(search(strg))...>>> special_match("")True>>> special_match("az09.")True>>> special_match("az09.\n")False# The above test case is to catch out any attempt to use re.match()# with a `$` instead of `\Z` -- see point (6) below.>>> special_match("az09.#")False>>> special_match("az09.X")False>>>

Note: There is a comparison with using re.match() further down in this answer. Further timings show that match() would win with much longer strings; match() seems to have a much larger overhead than search() when the final answer is True; this is puzzling (perhaps it's the cost of returning a MatchObject instead of None) and may warrant further rummaging.

==== Earlier text ====

The [previously] accepted answer could use a few improvements:

(1) Presentation gives the appearance of being the result of an interactive Python session:

reg=re.compile('^[a-z0-9\.]+$')>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')True

but match() doesn't return True

(2) For use with match(), the ^ at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^

(3) Should foster the use of raw string automatically unthinkingly for any re pattern

(4) The backslash in front of the dot/period is redundant

(5) Slower than the OP's code!

prompt>rem OP's version -- NOTE: OP used raw string!prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[^a-z0-9\.]')" "not bool(reg.search(t))"1000000 loops, best of 3: 1.43 usec per loopprompt>rem OP's version w/o backslashprompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"1000000 loops, best of 3: 1.44 usec per loopprompt>rem cleaned-up version of accepted answerprompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[a-z0-9.]+\Z')" "bool(reg.match(t))"100000 loops, best of 3: 2.07 usec per loopprompt>rem accepted answerprompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile('^[a-z0-9\.]+$')" "bool(reg.match(t))"100000 loops, best of 3: 2.08 usec per loop

(6) Can produce the wrong answer!!

>>> import re>>> bool(re.compile('^[a-z0-9\.]+$').match('1234\n'))True # uh-oh>>> bool(re.compile('^[a-z0-9\.]+\Z').match('1234\n'))False


Simpler approach? A little more Pythonic?

>>> ok = "0123456789abcdef">>> all(c in ok for c in "123456abc")True>>> all(c in ok for c in "hello world")False

It certainly isn't the most efficient, but it's sure readable.