Regular expression matching a multiline block of text

Try this:

re.compile(r"^(.+)\n((?:\n.+)+)", re.MULTILINE)

I think your biggest problem is that you're expecting the ^ and $ anchors to match linefeeds, but they don't. In multiline mode, ^ matches the position immediately following a newline and $ matches the position immediately preceding a newline.

Be aware, too, that a newline can consist of a linefeed (\n), a carriage-return (\r), or a carriage-return+linefeed (\r\n). If you aren't certain that your target text uses only linefeeds, you should use this more inclusive version of the regex:

re.compile(r"^(.+)(?:\n|\r\n?)((?:(?:\n|\r\n?).+)+)", re.MULTILINE)

BTW, you don't want to use the DOTALL modifier here; you're relying on the fact that the dot matches everything except newlines.

python regex multiline

This will work:

>>> import re>>> rx_sequence=re.compile(r"^(.+?)\n\n((?:[A-Z]+\n)+)",re.MULTILINE)>>> rx_blanks=re.compile(r"\W+") # to remove blanks and newlines>>> text="""Some varying text1...... AAABBBBBBCCCCCCDDDDDDD... EEEEEEEFFFFFFFFGGGGGGG... HHHHHHIIIIIJJJJJJJKKKK...... Some varying text 2...... LLLLLMMMMMMNNNNNNNOOOO... PPPPPPPQQQQQQRRRRRRSSS... TTTTTUUUUUVVVVVVWWWWWW... """>>> for match in rx_sequence.finditer(text):...   title, sequence = match.groups()...   title = title.strip()...   sequence = rx_blanks.sub("",sequence)...   print "Title:",title...   print "Sequence:",sequence...   print...Title: Some varying text1Sequence: AAABBBBBBCCCCCCDDDDDDDEEEEEEEFFFFFFFFGGGGGGGHHHHHHIIIIIJJJJJJJKKKKTitle: Some varying text 2Sequence: LLLLLMMMMMMNNNNNNNOOOOPPPPPPPQQQQQQRRRRRRSSSTTTTTUUUUUVVVVVVWWWWWW

Some explanation about this regular expression might be useful: ^(.+?)\n\n((?:[A-Z]+\n)+)

The first character (^) means "starting at the beginning of a line". Be aware that it does not match the newline itself (same for $: it means "just before a newline", but it does not match the newline itself).
Then (.+?)\n\n means "match as few characters as possible (all characters are allowed) until you reach two newlines". The result (without the newlines) is put in the first group.
[A-Z]+\n means "match as many upper case letters as possible until you reach a newline. This defines what I will call a textline.
((?:textline)+) means match one or more textlines but do not put each line in a group. Instead, put all the textlines in one group.
You could add a final \n in the regular expression if you want to enforce a double newline at the end.
Also, if you are not sure about what type of newline you will get (\n or \r or \r\n) then just fix the regular expression by replacing every occurrence of \n by (?:\n|\r\n?).

python regex multiline

The following is a regular expression matching a multiline block of text:

import reresult = re.findall('(startText)(.+)((?:\n.+)+)(endText)',input)

CodeHunter

Regular expression matching a multiline block of text

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last