How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?
import repattern = re.compile("<(\d{4,5})>")for i, line in enumerate(open('test.txt')): for match in re.finditer(pattern, line): print 'Found on line %s: %s' % (i+1, match.group())
A couple of notes about the regex:
- You don't need the
?
at the end and the outer(...)
if you don't want to match the number with the angle brackets, but only want the number itself - It matches either 4 or 5 digits between the angle brackets
Update: It's important to understand that the match and capture in a regex can be quite different. The regex in my snippet above matches the pattern with angle brackets, but I ask to capture only the internal number, without the angle brackets.
More about regex in python can be found here : Regular Expression HOWTO
Doing it in one bulk read:
import retextfile = open(filename, 'r')filetext = textfile.read()textfile.close()matches = re.findall("(<(\d{4,5})>)?", filetext)
Line by line:
import retextfile = open(filename, 'r')matches = []reg = re.compile("(<(\d{4,5})>)?")for line in textfile: matches += reg.findall(line)textfile.close()
But again, the matches that returns will not be useful for anything except counting unless you added an offset counter:
import retextfile = open(filename, 'r')matches = []offset = 0reg = re.compile("(<(\d{4,5})>)?")for line in textfile: matches += [(reg.findall(line),offset)] offset += len(line)textfile.close()
But it still just makes more sense to read the whole file in at once.