How to combine multiple regex into single one in python? How to combine multiple regex into single one in python? python python

How to combine multiple regex into single one in python?


You need to compile all your regex functions. Check this example:

import rere1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-Z]\s[A-Z]*'sentences = [string1, string2, string3, string4]for sentence in sentences:    generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)


To findall with an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:

re_list = [    '\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,    ...    '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question]matches = []for r in re_list:   matches += re.findall( r, string)

For efficiency it would be better to use a list of compiled REs.

Alternatively you could join the element RE strings using

generic_re = re.compile( '|'.join( re_list) )


I see lots of people are using pipes, but that seems to only match the first instance. If you want to match all, then try using lookaheads.

Example:

>>> fruit_string = "10a11p" >>> fruit_regex = r'(?=.*?(?P<pears>\d+)p)(?=.*?(?P<apples>\d+)a)'>>> re.match(fruit_regex, fruit_string).groupdict(){'apples': '10', 'pears': '11'}>>> re.match(fruit_regex, fruit_string).group(0)'10a,11p'>>> re.match(fruit_regex, fruit_string).group(1)'11'

(?= ...) is a look ahead:

Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

.*?(?P<pears>\d+)p find a number followed a p anywhere in the string and name the number "pears"