Python搜索模块查询教程

为什么需要Python搜索模块？

在Python编程中，经常需要处理搜索任务，例如：

在文本中查找特定模式或字符串
搜索匹配特定模式的文件名
在目录结构中查找特定类型的文件
过滤和搜索数据结构中的元素

Python标准库提供了多个强大的搜索模块，可以帮助我们高效地完成这些任务。

核心搜索模块介绍

1. re模块 - 正则表达式搜索

Python的re模块提供正则表达式匹配操作，用于复杂的字符串搜索。

基础用法示例：

import re

text = "Python是一种强大的编程语言，Python 3.9发布于2020年"

# 搜索所有匹配项
matches = re.findall(r'Python', text)
print(f"找到 {len(matches)} 个匹配: {matches}")

# 搜索并获取匹配对象
match = re.search(r'\d+\.\d+', text)
if match:
    print(f"找到版本号: {match.group()}")

# 替换文本
new_text = re.sub(r'Python', 'Java', text)
print(f"替换后文本: {new_text}")

适用场景：日志分析、数据提取、文本处理、输入验证等。

2. fnmatch模块 - 文件名模式匹配

提供Unix shell风格的通配符匹配，用于简单的文件名匹配。

基础用法示例：

import fnmatch
import os

files = os.listdir('.')
pattern = "*.py"

# 筛选匹配模式的文件
py_files = [f for f in files if fnmatch.fnmatch(f, pattern)]
print(f"Python文件: {py_files}")

# 使用filter函数
txt_files = list(filter(lambda f: fnmatch.fnmatch(f, "*.txt"), files))
print(f"文本文件: {txt_files}")

适用场景：文件筛选、简单模式匹配、批量文件操作。

3. glob模块 - 文件路径搜索

查找符合特定规则的文件路径名，支持通配符操作。

基础用法示例：

import glob

# 查找当前目录所有Python文件
py_files = glob.glob("*.py")
print(f"当前目录Python文件: {py_files}")

# 递归查找所有目录中的文本文件
txt_files = glob.glob("**/*.txt", recursive=True)
print(f"所有文本文件: {txt_files}")

# 查找特定日期格式的文件
date_files = glob.glob("data_202[0-9]-[01][0-9]-[0-3][0-9].csv")
print(f"日期文件: {date_files}")

适用场景：文件查找、批量文件处理、数据加载。

高级搜索技巧

组合使用os和fnmatch模块

结合os.walk()和fnmatch实现递归文件搜索：

import os
import fnmatch

def find_files(root_dir, pattern):
    matches = []
    for root, dirs, files in os.walk(root_dir):
        for filename in fnmatch.filter(files, pattern):
            matches.append(os.path.join(root, filename))
    return matches

# 查找所有目录中的PNG图片
png_files = find_files("/path/to/directory", "*.png")
print(f"找到 {len(png_files)} 个PNG文件")

正则表达式高级搜索

使用命名分组和复杂模式进行高级文本提取：

import re

log_data = """
[2023-05-15 08:30:45] INFO: User 'alice' logged in from 192.168.1.101
[2023-05-15 08:45:12] ERROR: Database connection failed (user='bob')
[2023-05-15 09:15:33] WARNING: High memory usage detected (85%)
"""

# 带命名分组的正则表达式
pattern = r"\[(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (?P<level>\w+): (?P<message>.+)"

for match in re.finditer(pattern, log_data):
    print(f"时间: {match.group('timestamp')}")
    print(f"级别: {match.group('level')}")
    print(f"消息: {match.group('message')}")
    print("-" * 50)