Extracting text from XML using python

python xml

There is already a built-in XML library, notably ElementTree. For example:

>>> from xml.etree import cElementTree as ET>>> xmlstr = """... <root>... <page>...   <title>Chapter 1</title>...   <content>Welcome to Chapter 1</content>... </page>... <page>...  <title>Chapter 2</title>...  <content>Welcome to Chapter 2</content>... </page>... </root>... """>>> root = ET.fromstring(xmlstr)>>> for page in list(root):...     title = page.find('title').text...     content = page.find('content').text...     print('title: %s; content: %s' % (title, content))...title: Chapter 1; content: Welcome to Chapter 1title: Chapter 2; content: Welcome to Chapter 2

python xml

You can also try this code to extract texts:

from bs4 import BeautifulSoupimport csvdata ="""<page>  <title>Chapter 1</title>  <content>Welcome to Chapter 1</content></page><page> <title>Chapter 2</title> <content>Welcome to Chapter 2</content></page>"""soup = BeautifulSoup(data, "html.parser")########### Title #############required0 = soup.find_all("title")title = []for i in required0:    title.append(i.get_text())########### Content #############required0 = soup.find_all("content")content = []for i in required0:    content.append(i.get_text())doc1 = list(zip(title, content))for i in doc1:    print(i)

Output:

('Chapter 1', 'Welcome to Chapter 1')('Chapter 2', 'Welcome to Chapter 2')

python xml

Code :

from xml.etree import cElementTree as ETtree = ET.parse("test.xml")root = tree.getroot()for page in root.findall('page'):    print("Title: ", page.find('title').text)    print("Content: ", page.find('content').text)

Output:

Title:  Chapter 1Content:  Welcome to Chapter 1Title:  Chapter 2Content:  Welcome to Chapter 2

CodeHunter

Extracting text from XML using python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last