How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

python regex file-io text-mining string-parsing

import repattern = re.compile("<(\d{4,5})>")for i, line in enumerate(open('test.txt')):    for match in re.finditer(pattern, line):        print 'Found on line %s: %s' % (i+1, match.group())

A couple of notes about the regex:

You don't need the ? at the end and the outer (...) if you don't want to match the number with the angle brackets, but only want the number itself
It matches either 4 or 5 digits between the angle brackets

Update: It's important to understand that the match and capture in a regex can be quite different. The regex in my snippet above matches the pattern with angle brackets, but I ask to capture only the internal number, without the angle brackets.

More about regex in python can be found here : Regular Expression HOWTO

python regex file-io text-mining string-parsing

Doing it in one bulk read:

import retextfile = open(filename, 'r')filetext = textfile.read()textfile.close()matches = re.findall("(<(\d{4,5})>)?", filetext)

Line by line:

import retextfile = open(filename, 'r')matches = []reg = re.compile("(<(\d{4,5})>)?")for line in textfile:    matches += reg.findall(line)textfile.close()

But again, the matches that returns will not be useful for anything except counting unless you added an offset counter:

import retextfile = open(filename, 'r')matches = []offset = 0reg = re.compile("(<(\d{4,5})>)?")for line in textfile:    matches += [(reg.findall(line),offset)]    offset += len(line)textfile.close()

But it still just makes more sense to read the whole file in at once.

CodeHunter

How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last