Python: splitting string by all space characters
Edit
It turns out that \u200b is not technically defined as whitespace , and so python does not recognize it as matching \s even with the unicode flag on. So it must be treated as an non-whitespace character.
http://en.wikipedia.org/wiki/Whitespace_character#Unicode
http://bugs.python.org/issue13391
import rere.split(ur"[\u200b\s]+", "some string", flags=re.UNICODE)
You can use a regular expression with enabled Unicode matching:
>>> re.split(r'(?u)\s', u'a\u200bc d')[u'a', u'c', u'd']