Python collections 模块是什么？

Python collections 模块提供了专用的容器数据类型，用于补充内置的 dict、list、set 和 tuple。它包括用于频率计数的 Counter、提供自动默认值的 defaultdict、保留顺序的 OrderedDict、高效队列操作的 deque、带有命名字段的不可变元组 namedtuple，以及将多个 dict 合并为单一视图的 ChainMap。

什么时候应该用 Counter 而不是 defaultdict(int)？

当计数是主要目标时使用 Counter——它提供了有用的 most_common()、elements() 方法，并支持计数器之间的加法、减法等数学运算。当在处理过程中把频率 dict 作为副产品构建时，尤其是当需要超出简单计数的自定义逻辑时，使用 defaultdict(int)。Counter 针对频率分析进行了优化；defaultdict(int) 更通用。

Python 中的 deque 是线程安全的吗？

是的，deque 的操作对于两端的 append 和 pop 都是线程安全的。Python GIL 确保 append()、appendleft()、pop() 和 popleft() 是原子操作。但是，操作序列（先检查后修改）不是线程安全的，需要显式加锁。对于生产者-消费者用例，deque 提供了线程间的线程安全消息传递。

ChainMap 与合并字典有什么不同？

ChainMap 无需复制数据即可创建多个 dict 的单一视图——查找按顺序遍历每个映射，而写入只影响第一个映射。使用 {**dict1, **dict2} 或 dict1 | dict2 合并 dict 会创建一个包含复制数据的新 dict，其中最后的键获胜。ChainMap 非常适合需要在不修改原始数据的情况下进行覆盖的分层配置；当需要独立的统一 dict 时，合并更好。

可以在 type hints 中使用 collections 类型吗？

可以。对于 Python 3.9+，直接使用内置类型：deque[int]、Counter[str]、defaultdict[str, list]。对于 Python 3.8 及更早版本，从 typing 导入：from typing import Deque, Counter, DefaultDict。类型注解适用于所有 collections 类型，并与 mypy、pyright 及其他类型检查工具完全兼容。

Python Collections 模块：Counter、defaultdict、deque、namedtuple 指南

Q: namedtuple 和 dataclass 有什么区别？

namedtuple 是不可变的、更快、内存效率更高，且继承自 tuple——非常适合简单的只读数据。dataclass 默认是可变的，支持默认值、继承、方法和通过 __post_init__ 进行验证。对于轻量级不可变记录使用 namedtuple；当需要可变性、默认值、方法或验证时使用 dataclass。Python 3.10+ 中使用 slots=True 使 dataclass 在性能上具有竞争力。

Q: Python 3.7+ 中 OrderedDict 还重要吗？

从 Python 3.7 开始，常规 dict 保留了插入顺序，因此 OrderedDict 很少需要。在以下情况下仍然有用：需要用 move_to_end() 高效地重新定位元素时；等价语义需要考虑顺序时（具有相同元素但顺序不同的两个 OrderedDict 不相等）；或与专门期望 OrderedDict 的旧代码交互时。

Name: Soren Atelier

更新于 2026/2/18

Python 内置的数据结构——lists、dicts、tuples、sets——能覆盖大多数任务。但当你的代码不再停留在玩具示例级别时，你会开始碰到它们的边界：统计元素需要手写字典循环；分组数据会让你的代码到处都是 if key not in dict 的防御性判断；用 list 当队列会让你在从头部弹出时付出 O(n) 的代价；用普通 tuple 表达结构化记录会把字段访问变成难以阅读的“下标猜谜游戏”。每个变通方式单看都不大，但很快会叠加起来，让代码更难读、运行更慢、也更容易出错。

Python 标准库中的 collections 模块用一组“为特定问题而生”的容器类型解决这些痛点：Counter 一行完成计数；defaultdict 通过自动默认值消除 KeyError；deque 让序列两端的操作都达到 O(1)；namedtuple 在不引入完整 class 开销的前提下为 tuple 增加字段名；OrderedDict 与 ChainMap 则处理顺序敏感、以及分层查找等普通 dict 难以优雅表达的模式。

本指南将覆盖 collections 模块中的每个主要类，配套可运行代码、性能分析与真实场景用法。无论你是在处理日志文件、构建缓存、管理多层配置，还是搭建数据处理流水线，这些容器都能让代码更短、更快、也更可靠。

collections 模块概览

collections 模块提供了一些专用的容器数据类型，用来扩展 Python 通用的内置容器。

import collections
 
# See all available classes
print([name for name in dir(collections) if not name.startswith('_')])
# ['ChainMap', 'Counter', 'OrderedDict', 'UserDict', 'UserList',
#  'UserString', 'abc', 'defaultdict', 'deque', 'namedtuple']

Class	用途	替代
`Counter`	统计可哈希对象	手写 dict 计数循环
`defaultdict`	带自动默认值的 dict	`dict.setdefault()`、`if key not in` 判断
`deque`	双端队列，两端操作 O(1)	用作队列/栈的 `list`
`namedtuple`	带命名字段的 tuple	普通 tuple、简单 data class
`OrderedDict`	记住插入顺序的 dict	`dict`（3.7 之前）、顺序相关操作
`ChainMap`	分层字典查找	手动合并 dict

Counter：元素计数

Counter 是 dict 的子类，用于统计可哈希对象。它将元素映射到出现次数，并提供常用的频率分析方法。

创建 Counter

from collections import Counter
 
# From an iterable
words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
word_count = Counter(words)
print(word_count)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1})
 
# From a string
letter_count = Counter('mississippi')
print(letter_count)
# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
 
# From a dictionary
inventory = Counter({'shirts': 25, 'pants': 15, 'hats': 10})
 
# From keyword arguments
stock = Counter(laptops=5, monitors=12)

most_common() 与频率排序

from collections import Counter
 
text = "to be or not to be that is the question"
words = Counter(text.split())
 
# Get the 3 most common words
print(words.most_common(3))
# [('to', 2), ('be', 2), ('or', 1)]
 
# Get all elements sorted by frequency
print(words.most_common())
# [('to', 2), ('be', 2), ('or', 1), ('not', 1), ('that', 1), ('is', 1), ('the', 1), ('question', 1)]
 
# Least common: reverse the list or slice from the end
print(words.most_common()[-3:])
# [('is', 1), ('the', 1), ('question', 1)]

Counter 运算

Counter 支持加、减、交集、并集等运算——可以把它当作 multiset（多重集合）来用。

from collections import Counter
 
a = Counter(x=4, y=2, z=1)
b = Counter(x=1, y=3, z=5)
 
# Addition: combine counts
print(a + b)  # Counter({'z': 6, 'y': 5, 'x': 5})
 
# Subtraction: drops zero and negative results
print(a - b)  # Counter({'x': 3})
 
# Intersection (min of each)
print(a & b)  # Counter({'y': 2, 'x': 1, 'z': 1})
 
# Union (max of each)
print(a | b)  # Counter({'z': 5, 'x': 4, 'y': 3})

Counter 的实用模式

from collections import Counter
 
# Word frequency analysis
log_entries = [
    "ERROR: disk full",
    "WARNING: high memory",
    "ERROR: disk full",
    "ERROR: timeout",
    "WARNING: high memory",
    "ERROR: disk full",
    "INFO: backup complete",
]
error_types = Counter(entry.split(":")[0].strip() for entry in log_entries)
print(error_types)
# Counter({'ERROR': 4, 'WARNING': 2, 'INFO': 1})
 
# Find unique elements (count == 1)
data = [1, 2, 3, 2, 1, 4, 5, 4]
unique = [item for item, count in Counter(data).items() if count == 1]
print(unique)  # [3, 5]
 
# Check if one collection is a subset of another (anagram check)
def is_anagram(word1, word2):
    return Counter(word1.lower()) == Counter(word2.lower())
 
print(is_anagram("listen", "silent"))  # True
print(is_anagram("hello", "world"))    # False

想深入了解 Counter，可阅读我们专门的 Python Counter guide。

defaultdict：自动默认值

defaultdict 是 dict 的子类，会在访问缺失 key 时调用一个 factory function 来提供默认值，从而消除 KeyError 与各种防御性判断。

Factory functions

from collections import defaultdict
 
# int factory: default is 0
counter = defaultdict(int)
counter['apples'] += 1
counter['oranges'] += 3
print(dict(counter))  # {'apples': 1, 'oranges': 3}
 
# list factory: default is []
groups = defaultdict(list)
pairs = [('fruit', 'apple'), ('veggie', 'carrot'), ('fruit', 'banana'), ('veggie', 'pea')]
for category, item in pairs:
    groups[category].append(item)
print(dict(groups))
# {'fruit': ['apple', 'banana'], 'veggie': ['carrot', 'pea']}
 
# set factory: default is set()
index = defaultdict(set)
words = [('file1', 'python'), ('file2', 'python'), ('file1', 'java'), ('file3', 'python')]
for filename, lang in words:
    index[lang].add(filename)
print(dict(index))
# {'python': {'file1', 'file2', 'file3'}, 'java': {'file1'}}

分组（Grouping）模式

对 defaultdict(list) 来说，“把同类数据归组”是最常见用法。对比手写方式：

from collections import defaultdict
 
students = [
    ('Math', 'Alice'), ('Science', 'Bob'), ('Math', 'Charlie'),
    ('Science', 'Diana'), ('Math', 'Eve'), ('History', 'Frank'),
]
 
# Without defaultdict -- verbose and error-prone
groups_manual = {}
for subject, name in students:
    if subject not in groups_manual:
        groups_manual[subject] = []
    groups_manual[subject].append(name)
 
# With defaultdict -- clean and direct
groups = defaultdict(list)
for subject, name in students:
    groups[subject].append(name)
 
print(dict(groups))
# {'Math': ['Alice', 'Charlie', 'Eve'], 'Science': ['Bob', 'Diana'], 'History': ['Frank']}

嵌套 defaultdict

无需为每一层手动初始化，就能构建多层数据结构。

from collections import defaultdict
 
# Two-level nested defaultdict
def nested_dict():
    return defaultdict(int)
 
sales = defaultdict(nested_dict)
sales['2025']['Q1'] = 150000
sales['2025']['Q2'] = 175000
sales['2026']['Q1'] = 200000
print(sales['2025']['Q1'])  # 150000
print(sales['2024']['Q3'])  # 0 (auto-created, no KeyError)
 
# Arbitrary depth nesting with a recursive factory
def deep_dict():
    return defaultdict(deep_dict)
 
config = deep_dict()
config['database']['primary']['host'] = 'localhost'
config['database']['primary']['port'] = 5432
config['database']['replica']['host'] = 'replica.local'
print(config['database']['primary']['host'])  # localhost

自定义 factory function

from collections import defaultdict
 
# Lambda for custom defaults
scores = defaultdict(lambda: 100)  # Every student starts with 100
scores['Alice'] -= 5
scores['Bob'] -= 10
print(scores['Charlie'])  # 100 (new student gets default)
print(dict(scores))  # {'Alice': 95, 'Bob': 90, 'Charlie': 100}
 
# Named function for complex defaults
def default_user():
    return {'role': 'viewer', 'active': True, 'login_count': 0}
 
users = defaultdict(default_user)
users['alice']['role'] = 'admin'
print(users['bob'])  # {'role': 'viewer', 'active': True, 'login_count': 0}

更多模式请参考：Python defaultdict guide。

deque：双端队列

deque（读作 “deck”）提供两端 O(1) 的 append 与 pop 操作。对 list 来说，pop(0) 与 insert(0, x) 是 O(n)，因为需要移动所有元素。只要你的工作负载会频繁操作序列两端，deque 就是正确选择。

核心操作

from collections import deque
 
d = deque([1, 2, 3, 4, 5])
 
# O(1) operations on both ends
d.append(6)         # Add to right: [1, 2, 3, 4, 5, 6]
d.appendleft(0)     # Add to left:  [0, 1, 2, 3, 4, 5, 6]
 
right = d.pop()     # Remove from right: 6
left = d.popleft()  # Remove from left:  0
print(d)  # deque([1, 2, 3, 4, 5])
 
# Extend from both sides
d.extend([6, 7])          # Right extend: [1, 2, 3, 4, 5, 6, 7]
d.extendleft([-1, 0])     # Left extend (reversed): [0, -1, 1, 2, 3, 4, 5, 6, 7]

带 maxlen 的有界 deque

设置 maxlen 后，当加入元素超过上限时，会自动从另一端丢弃元素。非常适合滑动窗口与缓存。

from collections import deque
 
# Keep only the last 5 items
recent = deque(maxlen=5)
for i in range(10):
    recent.append(i)
 
print(recent)  # deque([5, 6, 7, 8, 9], maxlen=5)
 
# Sliding window average
def moving_average(iterable, window_size):
    window = deque(maxlen=window_size)
    for value in iterable:
        window.append(value)
        if len(window) == window_size:
            yield sum(window) / window_size
 
data = [10, 20, 30, 40, 50, 60, 70]
print(list(moving_average(data, 3)))
# [20.0, 30.0, 40.0, 50.0, 60.0]

旋转（Rotation）

rotate(n) 将元素向右移动 n 步；负值则向左旋转。

from collections import deque
 
d = deque([1, 2, 3, 4, 5])
 
d.rotate(2)   # Rotate right by 2
print(d)  # deque([4, 5, 1, 2, 3])
 
d.rotate(-3)  # Rotate left by 3
print(d)  # deque([2, 3, 4, 5, 1])

deque vs list 性能对比

from collections import deque
import time
 
# Benchmark: append/pop from left side
n = 100_000
 
# List: O(n) for each insert at position 0
start = time.perf_counter()
lst = []
for i in range(n):
    lst.insert(0, i)
list_time = time.perf_counter() - start
 
# Deque: O(1) for appendleft
start = time.perf_counter()
dq = deque()
for i in range(n):
    dq.appendleft(i)
deque_time = time.perf_counter() - start
 
print(f"List insert(0, x): {list_time:.4f}s")
print(f"Deque appendleft:  {deque_time:.4f}s")
print(f"Deque is {list_time / deque_time:.0f}x faster")
# Typical output:
# List insert(0, x): 1.2340s
# Deque appendleft:  0.0065s
# Deque is 190x faster

Operation	list	deque
`append(x)` (right)	O(1) amortized	O(1)
`pop()` (right)	O(1)	O(1)
`insert(0, x)` / `appendleft(x)`	O(n)	O(1)
`pop(0)` / `popleft()`	O(n)	O(1)
`access by index [i]`	O(1)	O(n)
Memory per element	更低	略高

当你需要两端的快速操作时用 deque；当你需要按下标的快速随机访问时用 list。

完整内容见：Python deque。

namedtuple：带命名字段的 tuple

namedtuple 能创建 tuple 的子类并添加命名字段，让代码更自解释，同时避免定义完整 class 的额外开销。

创建 namedtuple

from collections import namedtuple
 
# Define a type
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 4)
 
# Access by name or index
print(p.x)     # 3
print(p[1])    # 4
print(p)       # Point(x=3, y=4)
 
# Alternative field definition styles
Color = namedtuple('Color', 'red green blue')        # Space-separated string
Config = namedtuple('Config', 'host, port, database')  # Comma-separated string

为什么用 namedtuple 而不是普通 tuple？

from collections import namedtuple
 
# Plain tuple: which index is what?
employee_tuple = ('Alice', 'Engineering', 95000, True)
print(employee_tuple[2])  # 95000 -- but what does index 2 mean?
 
# namedtuple: self-documenting
Employee = namedtuple('Employee', 'name department salary active')
employee = Employee('Alice', 'Engineering', 95000, True)
print(employee.salary)     # 95000 -- immediately clear
print(employee.department) # Engineering

关键方法

from collections import namedtuple
 
Employee = namedtuple('Employee', 'name department salary')
emp = Employee('Alice', 'Engineering', 95000)
 
# _replace: create a new instance with some fields changed (immutable)
promoted = emp._replace(salary=110000)
print(promoted)  # Employee(name='Alice', department='Engineering', salary=110000)
print(emp)       # Employee(name='Alice', department='Engineering', salary=95000)  -- unchanged
 
# _asdict: convert to OrderedDict (Python 3.8+ returns regular dict)
print(emp._asdict())
# {'name': 'Alice', 'department': 'Engineering', 'salary': 95000}
 
# _fields: get field names
print(Employee._fields)  # ('name', 'department', 'salary')
 
# _make: create from an iterable
data = ['Bob', 'Marketing', 85000]
emp2 = Employee._make(data)
print(emp2)  # Employee(name='Bob', department='Marketing', salary=85000)

默认值

from collections import namedtuple
 
# defaults parameter (Python 3.6.1+)
Connection = namedtuple('Connection', 'host port timeout', defaults=[5432, 30])
conn1 = Connection('localhost')               # port=5432, timeout=30
conn2 = Connection('db.example.com', 3306)    # timeout=30
conn3 = Connection('db.example.com', 3306, 60)
 
print(conn1)  # Connection(host='localhost', port=5432, timeout=30)
print(conn2)  # Connection(host='db.example.com', port=3306, timeout=30)

typing.NamedTuple 替代方案

如果你需要 type annotations 与更“类”的写法，可用 typing.NamedTuple：

from typing import NamedTuple
 
class Point(NamedTuple):
    x: float
    y: float
    label: str = "origin"
 
p = Point(3.0, 4.0, "A")
print(p.x, p.label)  # 3.0 A
 
# Still a tuple -- supports unpacking, indexing, iteration
x, y, label = p
print(f"({x}, {y})")  # (3.0, 4.0)

namedtuple vs dataclass

Feature	namedtuple	dataclass
默认不可变	是	否（需要 `frozen=True`）
内存占用	与 tuple 相同（小）	更大（普通 class）
迭代/解包	支持（它就是 tuple）	不支持（除非你添加方法）
Type annotations	通过 `typing.NamedTuple`	内置支持
方法/属性	需要 subclassing	直接支持
继承	受限	完整 class 继承
最适合	轻量数据记录	复杂可变对象

OrderedDict：有序字典操作

自 Python 3.7 起，普通 dict 已保留插入顺序。那么 OrderedDict 还有什么价值？

OrderedDict 仍然重要的场景

from collections import OrderedDict
 
# 1. Equality considers order
d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
print(d1 == d2)  # True -- regular dicts ignore order in comparison
 
od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
print(od1 == od2)  # False -- OrderedDict considers order
 
# 2. move_to_end() for reordering
od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
od.move_to_end('a')           # Move 'a' to the end
print(list(od.keys()))  # ['b', 'c', 'a']
 
od.move_to_end('c', last=False)  # Move 'c' to the beginning
print(list(od.keys()))  # ['c', 'b', 'a']

用 OrderedDict 构建 LRU Cache

from collections import OrderedDict
 
class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
 
    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # Mark as recently used
        return self.cache[key]
 
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # Remove oldest
 
cache = LRUCache(3)
cache.put('a', 1)
cache.put('b', 2)
cache.put('c', 3)
cache.get('a')       # Access 'a', moves it to end
cache.put('d', 4)    # Evicts 'b' (least recently used)
print(list(cache.cache.keys()))  # ['c', 'a', 'd']

ChainMap：分层字典查找

ChainMap 会把多个 dict 组合成一个用于查找的“视图”。它按顺序搜索每个 dict，返回第一个匹配项。非常适合多层配置、作用域变量查找与上下文管理等场景。

基础用法

from collections import ChainMap
 
defaults = {'theme': 'light', 'language': 'en', 'timeout': 30}
user_prefs = {'theme': 'dark'}
session = {'language': 'fr'}
 
config = ChainMap(session, user_prefs, defaults)
 
# Lookup searches session -> user_prefs -> defaults
print(config['theme'])     # 'dark'    (from user_prefs)
print(config['language'])  # 'fr'      (from session)
print(config['timeout'])   # 30        (from defaults)

配置分层（Configuration layering）

from collections import ChainMap
import os
 
# Real-world config pattern: CLI args > env vars > config file > defaults
defaults = {
    'debug': False,
    'log_level': 'WARNING',
    'port': 8080,
    'host': '0.0.0.0',
}
 
config_file = {
    'log_level': 'INFO',
    'port': 9090,
}
 
env_vars = {
    k.lower(): v for k, v in os.environ.items()
    if k.lower() in defaults
}
 
cli_args = {'debug': True}  # Parsed from argparse
 
config = ChainMap(cli_args, env_vars, config_file, defaults)
print(config['debug'])      # True (from cli_args)
print(config['log_level'])  # 'INFO' (from config_file)
print(config['host'])       # '0.0.0.0' (from defaults)

使用 new_child() 的作用域上下文

from collections import ChainMap
 
# Simulating variable scoping (like nested function scopes)
global_scope = {'x': 1, 'y': 2}
local_scope = ChainMap(global_scope)
 
# Enter a new scope
inner_scope = local_scope.new_child()
inner_scope['x'] = 10  # Shadows global x
inner_scope['z'] = 30  # New local variable
 
print(inner_scope['x'])  # 10 (local)
print(inner_scope['y'])  # 2  (falls through to global)
print(inner_scope['z'])  # 30 (local)
 
# Exit scope -- original is unchanged
print(local_scope['x'])  # 1 (global still intact)

所有集合类型对比

Type	Base Class	Mutable	使用场景	核心优势
`Counter`	`dict`	Yes	计数	`most_common()`、multiset 运算
`defaultdict`	`dict`	Yes	自动初始化缺失 key	无 `KeyError`、factory function
`deque`	--	Yes	双端队列	两端 O(1)、`maxlen`
`namedtuple`	`tuple`	No	结构化数据记录	命名字段访问、轻量
`OrderedDict`	`dict`	Yes	顺序敏感的 dict	`move_to_end()`、顺序影响相等性
`ChainMap`	--	Yes	分层查找	配置分层、作用域上下文

性能基准测试

Counter vs 手写计数

from collections import Counter, defaultdict
import time
 
data = list(range(1000)) * 1000  # 1 million items, 1000 unique
 
# Method 1: Counter
start = time.perf_counter()
c = Counter(data)
counter_time = time.perf_counter() - start
 
# Method 2: defaultdict(int)
start = time.perf_counter()
dd = defaultdict(int)
for item in data:
    dd[item] += 1
dd_time = time.perf_counter() - start
 
# Method 3: Manual dict
start = time.perf_counter()
manual = {}
for item in data:
    manual[item] = manual.get(item, 0) + 1
manual_time = time.perf_counter() - start
 
print(f"Counter:         {counter_time:.4f}s")
print(f"defaultdict(int):{dd_time:.4f}s")
print(f"dict.get():      {manual_time:.4f}s")
# Typical: Counter ~0.03s, defaultdict ~0.07s, dict.get() ~0.09s

deque vs list 的队列操作

from collections import deque
import time
 
n = 100_000
 
# Simulate a FIFO queue: append right, pop left
# List
start = time.perf_counter()
q = list(range(n))
while q:
    q.pop(0)
list_queue_time = time.perf_counter() - start
 
# Deque
start = time.perf_counter()
q = deque(range(n))
while q:
    q.popleft()
deque_queue_time = time.perf_counter() - start
 
print(f"List pop(0):     {list_queue_time:.4f}s")
print(f"Deque popleft(): {deque_queue_time:.4f}s")
print(f"Deque is {list_queue_time / deque_queue_time:.0f}x faster")
# Typical: List ~2.5s, Deque ~0.004s -> ~600x faster

真实场景示例

使用 Counter 进行日志分析

from collections import Counter
from datetime import datetime
 
# Parse and analyze server logs
log_lines = [
    "2026-02-18 10:15:03 GET /api/users 200",
    "2026-02-18 10:15:04 POST /api/login 401",
    "2026-02-18 10:15:05 GET /api/users 200",
    "2026-02-18 10:15:06 GET /api/products 500",
    "2026-02-18 10:15:07 POST /api/login 200",
    "2026-02-18 10:15:08 GET /api/users 200",
    "2026-02-18 10:15:09 GET /api/products 500",
    "2026-02-18 10:15:10 POST /api/login 401",
]
 
# Count status codes
status_codes = Counter(line.split()[-1] for line in log_lines)
print("Status codes:", status_codes.most_common())
# [('200', 4), ('401', 2), ('500', 2)]
 
# Count endpoints
endpoints = Counter(line.split()[3] for line in log_lines)
print("Top endpoints:", endpoints.most_common(2))
# [('/api/users', 3), ('/api/login', 3)]
 
# Count error endpoints (status >= 400)
errors = Counter(
    line.split()[3] for line in log_lines
    if int(line.split()[-1]) >= 400
)
print("Error endpoints:", errors)
# Counter({'/api/login': 2, '/api/products': 2})

用 ChainMap 做配置管理

from collections import ChainMap
import json
 
# Multi-layer config system for a web application
def load_config(config_path=None, cli_overrides=None):
    # Layer 1: Hard-coded defaults
    defaults = {
        'host': '127.0.0.1',
        'port': 8000,
        'debug': False,
        'db_pool_size': 5,
        'log_level': 'WARNING',
        'cors_origins': ['http://localhost:3000'],
    }
 
    # Layer 2: Config file
    file_config = {}
    if config_path:
        with open(config_path) as f:
            file_config = json.load(f)
 
    # Layer 3: CLI overrides (highest priority)
    cli = cli_overrides or {}
 
    # ChainMap searches cli -> file_config -> defaults
    return ChainMap(cli, file_config, defaults)
 
# Usage
config = load_config(cli_overrides={'debug': True, 'port': 9000})
print(config['debug'])        # True (CLI override)
print(config['port'])         # 9000 (CLI override)
print(config['db_pool_size']) # 5    (default)
print(config['log_level'])    # WARNING (default)

用 deque 实现最近项缓存

from collections import deque
 
class RecentItemsTracker:
    """Track the N most recent unique items."""
 
    def __init__(self, max_items=10):
        self.items = deque(maxlen=max_items)
        self.seen = set()
 
    def add(self, item):
        if item in self.seen:
            # Move to front by removing and re-adding
            self.items.remove(item)
            self.items.append(item)
        else:
            if len(self.items) == self.items.maxlen:
                # Remove the oldest item from the set too
                oldest = self.items[0]
                self.seen.discard(oldest)
            self.items.append(item)
            self.seen.add(item)
 
    def get_recent(self):
        return list(reversed(self.items))
 
# Track recently viewed products
tracker = RecentItemsTracker(max_items=5)
for product in ['shoes', 'shirt', 'hat', 'shoes', 'jacket', 'belt', 'hat']:
    tracker.add(product)
 
print(tracker.get_recent())
# ['hat', 'belt', 'jacket', 'shoes', 'shirt']

用 namedtuple 构建数据流水线

from collections import namedtuple, Counter, defaultdict
 
# Define structured records
Transaction = namedtuple('Transaction', 'id customer product amount date')
 
transactions = [
    Transaction(1, 'Alice', 'Widget', 29.99, '2026-02-01'),
    Transaction(2, 'Bob', 'Gadget', 49.99, '2026-02-01'),
    Transaction(3, 'Alice', 'Widget', 29.99, '2026-02-03'),
    Transaction(4, 'Charlie', 'Gadget', 49.99, '2026-02-05'),
    Transaction(5, 'Alice', 'Gizmo', 19.99, '2026-02-07'),
    Transaction(6, 'Bob', 'Widget', 29.99, '2026-02-08'),
]
 
# Most popular products
product_count = Counter(t.product for t in transactions)
print("Popular products:", product_count.most_common())
# [('Widget', 3), ('Gadget', 2), ('Gizmo', 1)]
 
# Revenue by customer
revenue = defaultdict(float)
for t in transactions:
    revenue[t.customer] += t.amount
print("Revenue:", dict(revenue))
# {'Alice': 79.97, 'Bob': 79.98, 'Charlie': 49.99}
 
# Convert to DataFrame for visualization
import pandas as pd
df = pd.DataFrame(transactions, columns=Transaction._fields)
print(df.groupby('customer')['amount'].sum())

使用 PyGWalker 可视化集合数据

当你用 Counter、defaultdict 或 namedtuple 处理完数据后，通常还需要把结果可视化。PyGWalker (opens in a new tab) 可以把任意 pandas DataFrame 直接变成类似 Tableau 的交互式可视化界面，并在 Jupyter notebooks 中使用：

from collections import Counter
import pandas as pd
import pygwalker as pyg
 
# Process data with collections
log_data = ["ERROR", "WARNING", "ERROR", "INFO", "ERROR", "WARNING", "INFO", "INFO"]
counts = Counter(log_data)
 
# Convert to DataFrame
df = pd.DataFrame(counts.items(), columns=['Level', 'Count'])
 
# Launch interactive visualization
walker = pyg.walk(df)

它支持拖拽字段、创建图表、筛选数据、交互式探索分布与模式——无需手写可视化代码。尤其当你处理了大规模数据，并通过 Counter 或 defaultdict 分组得到统计结果时，它能让你更快地理解数据特征。

如果你想以交互方式运行这些集合实验，RunCell (opens in a new tab) 提供了 AI-powered 的 Jupyter 环境，支持你带即时反馈地迭代数据处理流水线。

组合多种集合类型

collections 的真正威力往往体现在把多种类型串在同一条流水线里使用。

from collections import Counter, defaultdict, namedtuple, deque
 
# Named record type
LogEntry = namedtuple('LogEntry', 'timestamp level message')
 
# Simulated log stream
log_stream = deque([
    LogEntry('10:01', 'ERROR', 'Connection timeout'),
    LogEntry('10:02', 'INFO', 'Request processed'),
    LogEntry('10:03', 'ERROR', 'Connection timeout'),
    LogEntry('10:04', 'WARNING', 'High memory'),
    LogEntry('10:05', 'ERROR', 'Disk full'),
    LogEntry('10:06', 'INFO', 'Request processed'),
    LogEntry('10:07', 'ERROR', 'Connection timeout'),
], maxlen=100)
 
# Count error types
error_counts = Counter(
    entry.message for entry in log_stream if entry.level == 'ERROR'
)
print("Error types:", error_counts.most_common())
# [('Connection timeout', 3), ('Disk full', 1)]
 
# Group entries by level
by_level = defaultdict(list)
for entry in log_stream:
    by_level[entry.level].append(entry)
 
for level, entries in by_level.items():
    print(f"{level}: {len(entries)} entries")
# ERROR: 4 entries
# INFO: 2 entries
# WARNING: 1 entries

FAQ

什么是 Python collections 模块？

collections 模块是 Python 标准库的一部分。它提供专用的容器数据类型，用于在内置类型（dict、list、tuple、set）基础上增加更多能力。主要类包括 Counter、defaultdict、deque、namedtuple、OrderedDict 与 ChainMap。每一种都能更高效地解决某类特定的数据处理问题，而不仅仅依赖内置类型。

什么时候用 Counter，什么时候用 defaultdict(int)？

当你的核心目标是“计数”或比较频率分布时，用 Counter：它提供 most_common()、算术运算符（+、-、&、|），并且可以在构造时一次性统计整个 iterable。当计数只是更大数据结构模式中的一部分，或你需要一个带整数默认值的通用字典时，用 defaultdict(int) 更合适。

deque 在 Python 中是 thread-safe 吗？

是的。在 CPython 中，deque.append()、deque.appendleft()、deque.pop()、deque.popleft() 由于 GIL（Global Interpreter Lock）是原子操作。因此，deque 可以在无需额外锁的情况下作为 thread-safe queue 使用。但需要注意：复合操作（例如先判断再执行的 check-then-act 流程）仍然需要显式同步。

namedtuple 和 dataclass 有什么区别？

namedtuple 创建带命名字段的、不可变的 tuple 子类。它很轻量，支持迭代与解包，占用内存也很小。dataclass（dataclasses 模块，Python 3.7+）创建完整的 class，默认属性可变，并支持方法、属性与继承。简单的不可变记录用 namedtuple；当你需要可变性、复杂行为或更丰富的 type annotations 时用 dataclass。

Python 3.7+ 中 OrderedDict 还重要吗？

是的，但主要集中在两个场景：第一，OrderedDict 的相等性比较会考虑元素顺序（OrderedDict(a=1, b=2) != OrderedDict(b=2, a=1)），而普通 dict 的比较不会；第二，OrderedDict 提供 move_to_end() 用于重排元素，这在实现 LRU cache 与基于优先级的数据结构时很有用。其他大多数场景下，普通 dict 已足够且性能更好。

ChainMap 和合并字典有什么不同？

ChainMap 在不复制数据的前提下，为多个 dict 提供一个查找视图：查找会按顺序搜索每个 dict，并且对底层 dict 的修改会立即反映到 ChainMap 中。相比之下，使用 {**d1, **d2} 或 d1 | d2 会创建一个新 dict，并复制所有数据。对大字典来说，ChainMap 更省内存，并能保留“分层结构”，非常适合配置与作用域模式。

collections 的类型能配合 type hints 使用吗？

可以。你可以用 collections.Counter[str] 声明带类型的 Counter，用 collections.defaultdict[str, list[int]] 声明带类型的 defaultdict，用 collections.deque[int] 声明带类型的 deque。对 namedtuple，更推荐 typing.NamedTuple，因为它能在 class 定义中直接写 type annotations。所有这些类型都与 mypy 等类型检查工具兼容。

总结

Python 的 collections 模块提供了六种专用容器类型，用来消除常见的样板代码模式：Counter 替代手写计数循环；defaultdict 省去 KeyError 处理；deque 提供高效的双端操作；namedtuple 为 tuple 增加可读字段名；OrderedDict 处理顺序敏感的比较与重排；ChainMap 在不复制数据的情况下管理分层字典查找。

每一种类型都在特定问题上比内置容器更合适。掌握何时使用它们，会让你的 Python 代码更短、更快、也更容易维护。关键在于让数据结构匹配操作模式：计数（Counter）、分组（defaultdict）、队列/栈（deque）、结构化记录（namedtuple）、有序操作（OrderedDict）、分层查找（ChainMap）。

📚