Python uses ElementTree to quickly parse XML files

1. How important is XML file parsing

Suppose you receive an XML file like this:

&lt;bookstore&gt;
  &lt;book category="programming"&gt;
    &lt;title&gt;PythonFrom Beginner to Mastery&lt;/title&gt;
    &lt;author&gt;Zhang Wei&lt;/author&gt;
    &lt;year&gt;2023&lt;/year&gt;
  &lt;/book&gt;
  &lt;book category="novel"&gt;
    &lt;title&gt;Three-body&lt;/title&gt;
    &lt;author&gt;Liu Cixin&lt;/author&gt;
    &lt;year&gt;2008&lt;/year&gt;
  &lt;/book&gt;
&lt;/bookstore&gt;

All the titles and author information need to be extracted, what would you do? Manual copy and paste? This obviously doesn't work when the file has a few hundred MB! Python's ElementTree module was created to solve this kind of problem.

2. Get started with ElementTree

1. Two ways to load XML

Method 1: Direct parse strings

import  as ET

xml_string = """
&lt;bookstore&gt;
  &lt;book category="programming"&gt;
    &lt;title&gt;PythonFrom Beginner to Mastery&lt;/title&gt;
    &lt;author&gt;Zhang Wei&lt;/author&gt;
  &lt;/book&gt;
&lt;/bookstore&gt;
"""

root = (xml_string)  # Load from string

Method 2: Read XML file

tree = ('')  # Load from fileroot = ()

2. Traversing XML nodes

Get all book nodes:

for book in ('book'):
    print("Find a book:")
    print(f"category：{('category')}")
    print(f"Book title：{('title').text}")
    print(f"author：{('author').text}")

Output result:

Found a book:
Category: Programming
Book title: Python from Beginner to Mastery
Author: Zhang Wei
Found a book:
Category: Novel
Book title: Three Bodies
Author: Liu Cixin

3. Detailed explanation of ElementTree core operation

1. Three ways to find elements

# Find the first matching nodefirst_book = ('book')

# Find all matching nodesall_books = ('book')

# Use XPath to find (more powerful)titles = ('.//title')  # Find all title nodes

2. Get node properties and text

# Get attributescategory = ('category')

# Get text contenttitle = ('title').text

# Handle nodes that may not existyear = ('year')
if year is not None:
    print()

3. Handle namespaces

What should I do if I encounter XML with namespace?

&lt;ns:book xmlns:ns=""&gt;
  &lt;ns:title&gt;XMLAnalysis Guide&lt;/ns:title&gt;
&lt;/ns:book&gt;

Analysis method:

ns = {'ns': ''}
title = ('ns:title', ns).text

4. Practical combat: parse real scene XML

Suppose you want to process an RSS feed (actually XML format):

import requests

url = "/rss"
response = (url)
root = ()

for item in ('.//item'):
    print(f"title：{('title').text}")
    print(f"Link：{('link').text}")
    print("----")

5. Performance optimization skills

When processing large XML files (such as hundreds of MB):

1. Use iterative analysis

for event, elem in ('big_file.xml'):
    if  == 'book':
        print(('title').text)
        ()  # Clean the memory in time

2. Use lxml to speed up

from lxml import etree  # Need to install: pip install lxml
# 3-5 times faster than the standard libraryparser = (remove_blank_text=True)
tree = ('', parser)

6. Frequently Asked Questions

Question 1: What to do if the encoding is incorrect?

with open('', 'r', encoding='utf-8') as f:
    tree = (f)

Question 2: Handling special characters

from  import escape
safe_text = escape('Text & Special Characters<>"')

Question 3: Beautify the output

from  import minidom
xml_str = (root)
pretty_xml = (xml_str).toprettyxml()

7. Complete code example

import  as ET

def parse_xml(file_path):
    tree = (file_path)
    root = ()
    
    results = []
    for book in ('book'):
        data = {
            'category': ('category'),
            'title': ('title').text,
            'author': ('author').text,
            'year': ('year').text if ('year') is not None else None
        }
        (data)
    
    return results

#User Examplebooks = parse_xml('')
for book in books:
    print(f"{book['title']}（{book['year']}）")

8. Summary

ElementTree is the preferred tool for Python to handle XML because it:

Simple and easy to use: a few lines of code can parse complex XML
Full functions: Supports advanced features such as XPath and namespace
Good performance: can handle GB-level files with lxml

Remember these key points:

Use () for small files
Large files use ()
High performance requirements use lxml

This is the end of this article about Python using ElementTree to quickly parse XML files. For more related Python ElementTree to parse XML content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!