Regex punctuation split [Python] Regex punctuation split [Python] python python

Regex punctuation split [Python]


The official Python documentation has a good example for this one. It will split on all non-alphanumeric characters (whitespace and punctuation). Literally \W is the character class for all Non-Word characters. Note: the underscore "_" is considered a "word" character and will not be part of the split here.

re.split('\W+', 'Words, words, words.')

See https://docs.python.org/3/library/re.html for more examples, search page for "re.split"


Using string.punctuation and character class:

>>> from string import punctuation>>> r = re.compile(r'[\s{}]+'.format(re.escape(punctuation)))>>> r.split('dss!dfs^  #$% jjj^')['dss', 'dfs', 'jjj', '']


import rest='one two,three; four-five,    six'print re.split(r'\s+|[,;.-]\s*', st)# ['one', 'two', 'three', 'four', 'five', 'six']