Python3 Xml Processing
## Python3.x Python3 XML Processing
* * *
XML stands for e**X**tensible **M**arkup **L**anguage, a subset of standard generalized markup language, which is a markup language used to mark electronic files and give them structure. You can learn (#) through this site.
XML was designed to transport and store data.
XML is a set of rules for defining semantic tags, which divide documents into many parts and identify these parts.
It is also a meta-markup language, which defines the syntax language for defining other domain-specific, semantic, structured markup languages.
* * *
## Python's XML Parsing
Common XML programming interfaces are DOM and SAX. These two interfaces handle XML files differently, and of course, they are used in different scenarios.
Python has three methods to parse XML: ElementTree, SAX, and DOM.
### 1. ElementTree
xml.etree.ElementTree is a module in Python's standard library for processing XML. It provides a simple and efficient API for parsing and generating XML documents.
### 2.SAX (simple API for XML )
Python's standard library includes a SAX parser. SAX uses an event-driven model, triggering events one by one during XML parsing and calling user-defined callback functions to process XML files.
### 3.DOM(Document Object Model)
Parse XML data in memory into a tree, and operate on XML through operations on the tree.
The XML example file **movies.xml** used in this chapter is as follows:
## Example
War, Thriller
DVD
2003
PG
10
Talk about a US-Japan war
Anime, Science Fiction
DVD
1989
R
8
A schientific fiction
Anime, Action
DVD
4
PG
10
Vash the Stampede!
Comedy
VHS
PG
2
Viewable boredom
* * *
## Python Using ElementTree to Parse XML
xml.etree.ElementTree is a module in Python's standard library for processing XML.
Here are some key concepts and usage of the xml.etree.ElementTree module:
ElementTree and Element Objects:
* **ElementTree**: The ElementTree class is a tree representation of an XML document. It contains one or more Element objects, representing the entire XML document.
* **Element**: The Element object represents an element in an XML document. Each element has a tag, a set of attributes, and zero or more child elements.
### Parsing XML
**fromstring() Method**: The fromstring() method can convert a string containing XML data into an Element object:
## Example
import xml.etree.ElementTree as ET
xml_string ='Some data'
root = ET.fromstring(xml_string)
**parse() Method**: If the XML data is stored in a file, you can use the parse() method to parse the entire XML document:
tree = ET.parse('example.xml') root = tree.getroot()
### Traversing XML Tree
**find() Method**: The find() method can find the first child element with a specified tag:
title_element = root.find('title')
**findall() Method**: The findall() method can find all child elements with a specified tag:
book_elements = root.findall('book')
### Accessing Element Attributes and Text Content
attrib attribute: You can access element attributes through the attrib attribute:
price = book_element.attrib['price']
text attribute: You can access element text content through the text attribute:
title_text = title_element.text
### Creating XML
**Element() Constructor**: You can create new elements using the Element() constructor:
new_element = ET.Element('new_element')
**SubElement() Function**: You can create child elements with specified tags using the SubElement() function:
new_sub_element = ET.SubElement(root, 'new_sub_element')
### Modifying XML
Modifying element attributes and text content: Directly modify the element's attrib and text attributes.
Removing elements: You can remove elements using the remove() method:
root.remove(title_element)
Simple reading of XML content:
## Example
import xml.etree.ElementTree as ET
# Define an XML string
xml_string ='''
Introduction to Python
John Doe
29.99
Data Science with Python
Jane Smith
39.95
'''
# Parse XML string using ElementTree
root = ET.fromstring(xml_string)
# Traverse XML tree
for book in root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = book.find('price').text
print(f'Title: {title}, Author: {author}, Price: {price}')
The output of executing the above code is:
Title: Introduction to Python, Author: John Doe, Price: 29.99Title: Data Science with Python, Author: Jane Smith, Price: 39.95
Below we will create an XML document containing book information, and then use the module to parse and manipulate it:
## Example
import xml.etree.ElementTree as ET
# Create an XML document
root = ET.Element('bookstore')
# Add the first book
book1 = ET.SubElement(root,'book')
title1 = ET.SubElement(book1,'title')
title1.text='Introduction to Python'
author1 = ET.SubElement(book1,'author')
author1.text='John Doe'
price1 = ET.SubElement(book1,'price')
price1.text='29.99'
# Add the second book
book2 = ET.SubElement(root,'book')
title2 = ET.SubElement(book2,'title')
title2.text='Data Science with Python'
author2 = ET.SubElement(book2,'author')
author2.text='Jane Smith'
price2 = ET.SubElement(book2,'price')
price2.text='39.95'
# Save XML document to file
tree = ET.ElementTree(root)
tree.write('books.xml')
# Parse XML document from file
parsed_tree = ET.parse('books.xml')
parsed_root = parsed_tree.getroot()
# Traverse XML tree and print book information
for book in parsed_root.findall('book'):
title = book.find('title').text
author = book.find('author').text
price = book.find('price').text
print(f'Title: {title}, Author: {author}, Price: {price}')
In the above example, we first create an XML document containing information about two books. Then, we save this document to a file named books.xml. Next, we use the ET.parse() method to parse the XML document from the file, and traverse the document tree to extract and print the title, author, and price information for each book.
* * *
## Python Using SAX to Parse XML
SAX is an event-driven API.
Parsing XML documents using SAX involves two parts: **parser** and **event handler**.
The parser is responsible for reading the XML document and sending events to the event handler, such as element start and element end events.
The event handler is responsible for responding to events and processing the transmitted XML data.
* 1, Processing large files;
* 2, Only need part of the file content, or only need specific information from the file.
* 3, When you want to build your own object model.
To use sax to process xml in Python, you need to first import the parse function from xml.sax, and ContentHandler from xml.sax.handler.
### ContentHandler Class Method Introduction
**characters(content) Method**
Timing of call:
When starting from the beginning of
YouTip