Parsing XML in Python with regex Parsing XML in Python with regex xml xml

Parsing XML in Python with regex


You normally don't want to use re.match. Quoting from the docs:

If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).

Note:

>>> print re.match('>.*<', line)None>>> print re.search('>.*<', line)<_sre.SRE_Match object at 0x10f666238>>>> print re.search('>.*<', line).group(0)>PLAINSBORO, NJ 08536-1906<

Also, why parse XML with regex when you can use something like BeautifulSoup :).

>>> from bs4 import BeautifulSoup as BS>>> line='<City_State>PLAINSBORO, NJ 08536-1906</City_State>'>>> soup = BS(line)>>> print soup.find('city_state').textPLAINSBORO, NJ 08536-1906


Please, just use an XML parser like ElementTree

>>> from xml.etree import ElementTree as ET>>> line='<City_State>PLAINSBORO, NJ 08536-1906</City_State>'>>> ET.fromstring(line).text'PLAINSBORO, NJ 08536-1906'


re.match returns a match only if the pattern matches the entire string. To find substrings matching the pattern, use re.search.

And yes, this is a simple way to parse XML, but I would highly encourage you to use a library specifically designed for the task.