当前位置：首页 > Python > 正文

Python解析XML获取数据完整教程 | Python XML处理指南

JiJi
Python
2025-07-24
2267

Python解析XML获取数据完全指南

本教程将详细讲解如何使用Python解析XML文件、提取数据、处理命名空间以及修改XML文档。XML作为一种常见的数据交换格式，在配置文件、Web服务和数据存储中广泛应用。

XML处理库选择

Python提供了多个XML处理库，最常用的是：

库名称	优点	缺点	适用场景
xml.etree.ElementTree	Python标准库，无需安装	功能相对基础	基本XML解析需求
lxml	功能强大，性能优异	需要单独安装	复杂XML处理，XPath支持
xml.dom	完整的DOM实现	内存消耗大	需要完整DOM树操作

安装XML处理库

ElementTree是Python标准库的一部分，无需安装。对于更高级的lxml库，可以使用pip安装：

pip install lxml

示例XML文件

我们将使用以下简单的XML文件作为示例：

<bookstore>
    <book category="cooking">
        <title lang="en">Everyday Italian</title>
        <author>Giada De Laurentiis</author>
        <year>2005</year>
        <price>30.00</price>
    </book>
    <book category="children">
        <title lang="en">Harry Potter</title>
        <author>J.K. Rowling</author>
        <year>2005</year>
        <price>29.99</price>
    </book>
    <book category="web">
        <title lang="en">Learning XML</title>
        <author>Erik T. Ray</author>
        <year>2003</year>
        <price>39.95</price>
    </book>
</bookstore>

使用ElementTree解析XML

1 导入库并加载XML

使用ElementTree解析XML文件：

import xml.etree.ElementTree as ET

tree = ET.parse('books.xml')
root = tree.getroot()
                

2 遍历XML元素

访问根元素并遍历子元素：

# 遍历所有book元素
for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    price = book.find('price').text
    print(f"Title: {title}, Author: {author}, Price: {price}")
                

3 访问元素属性

获取元素的属性值：

for book in root.findall('book'):
    category = book.get('category')
    lang = book.find('title').get('lang')
    print(f"Category: {category}, Language: {lang}")
                

使用lxml高级功能

lxml库提供了XPath支持和更强大的解析能力：

from lxml import etree

# 解析XML
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('books.xml', parser)
root = tree.getroot()

# 使用XPath查找元素
expensive_books = root.xpath("//book[price > 35]")
for book in expensive_books:
    title = book.find('title').text
    price = book.find('price').text
    print(f"Expensive book: {title}, Price: {price}")

# 修改XML内容
for price in root.xpath("//price"):
    new_price = float(price.text) * 0.9  # 打9折
    price.text = str(round(new_price, 2))

# 保存修改后的XML
tree.write('discounted_books.xml', pretty_print=True)
        

处理XML命名空间

当XML包含命名空间时，需要特殊处理：

<root xmlns:bk="http://example.com/books">
    <bk:book>
        <bk:title>Python Guide</bk:title>
    </bk:book>
</root>

# 处理命名空间
namespaces = {'bk': 'http://example.com/books'}
titles = root.findall('bk:book/bk:title', namespaces)
for title in titles:
    print(title.text)
        

注意： 在解析大型XML文件时，考虑使用iterparse方法进行增量解析，以减少内存消耗。