Extracting text from XML using python Extracting text from XML using python xml xml

Extracting text from XML using python


There is already a built-in XML library, notably ElementTree. For example:

>>> from xml.etree import cElementTree as ET>>> xmlstr = """... <root>... <page>...   <title>Chapter 1</title>...   <content>Welcome to Chapter 1</content>... </page>... <page>...  <title>Chapter 2</title>...  <content>Welcome to Chapter 2</content>... </page>... </root>... """>>> root = ET.fromstring(xmlstr)>>> for page in list(root):...     title = page.find('title').text...     content = page.find('content').text...     print('title: %s; content: %s' % (title, content))...title: Chapter 1; content: Welcome to Chapter 1title: Chapter 2; content: Welcome to Chapter 2


You can also try this code to extract texts:

from bs4 import BeautifulSoupimport csvdata ="""<page>  <title>Chapter 1</title>  <content>Welcome to Chapter 1</content></page><page> <title>Chapter 2</title> <content>Welcome to Chapter 2</content></page>"""soup = BeautifulSoup(data, "html.parser")########### Title #############required0 = soup.find_all("title")title = []for i in required0:    title.append(i.get_text())########### Content #############required0 = soup.find_all("content")content = []for i in required0:    content.append(i.get_text())doc1 = list(zip(title, content))for i in doc1:    print(i)

Output:

('Chapter 1', 'Welcome to Chapter 1')('Chapter 2', 'Welcome to Chapter 2')


Code :

from xml.etree import cElementTree as ETtree = ET.parse("test.xml")root = tree.getroot()for page in root.findall('page'):    print("Title: ", page.find('title').text)    print("Content: ", page.find('content').text)

Output:

Title:  Chapter 1Content:  Welcome to Chapter 1Title:  Chapter 2Content:  Welcome to Chapter 2