Python: splitting string by all space characters Python: splitting string by all space characters python python

Python: splitting string by all space characters


Edit

It turns out that \u200b is not technically defined as whitespace , and so python does not recognize it as matching \s even with the unicode flag on. So it must be treated as an non-whitespace character.

http://en.wikipedia.org/wiki/Whitespace_character#Unicode

http://bugs.python.org/issue13391

import rere.split(ur"[\u200b\s]+", "some string", flags=re.UNICODE)


You can use a regular expression with enabled Unicode matching:

>>> re.split(r'(?u)\s', u'a\u200bc d')[u'a', u'c', u'd']


You can use re.split, like this:

import rere.split(u'\s|\u200b', your_string)