Split Strings into words with multiple word boundary delimiters Split Strings into words with multiple word boundary delimiters python python

Split Strings into words with multiple word boundary delimiters


re.split()

re.split(pattern, string[, maxsplit=0])

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This has been fixed in later releases.)

>>> re.split('\W+', 'Words, words, words.')['Words', 'words', 'words', '']>>> re.split('(\W+)', 'Words, words, words.')['Words', ', ', 'words', ', ', 'words', '.', '']>>> re.split('\W+', 'Words, words, words.', 1)['Words', 'words, words.']


A case where regular expressions are justified:

import reDATA = "Hey, you - what are you doing here!?"print re.findall(r"[\w']+", DATA)# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']


Another quick way to do this without a regexp is to replace the characters first, as below:

>>> 'a;bcd,ef g'.replace(';',' ').replace(',',' ').split()['a', 'bcd', 'ef', 'g']