Python Counter：使用collections.Counter计数和统计元素

Q: collections.Counter在Python中做什么？

collections.Counter是一个计算可哈希对象的字典子类。你传入任何可迭代对象，它返回一个类似字典的对象，其中键是元素，值是它们的计数。它提供most_common()等方法用于频率分析，算术运算符用于组合计数，并为缺失键返回零而不是引发KeyError。

Q: 如何使用Python Counter计算列表中项目的出现次数？

从collections导入Counter并直接传入你的列表：Counter(['a', 'b', 'a', 'c'])返回Counter({'a': 2, 'b': 1, 'c': 1})。要获取特定计数，用元素索引：counter['a']返回2。对于最频繁的项目，使用counter.most_common(n)。

Q: Counter和普通字典在计数方面有什么区别？

Counter专为计数而构建：一次调用就能计算整个可迭代对象，为缺失键返回0而不是KeyError，有most_common()用于排序频率，支持算术运算（+、-、&、|），并包含添加到计数而不是替换的update()。

Q: 能否减去或相加两个Counter对象？

可以。+运算符组合计数。-运算符减去计数并丢弃零或负结果。&运算符给出对应计数的最小值（交集），|给出最大值（并集）。要保留负计数的减法，使用subtract()方法。

Q: Python Counter对大数据集高效吗？

是的。Counter在CPython中以优化的C代码实现，使其比手动Python循环计数更快。从可迭代对象创建Counter是O(n)。访问单个计数是O(1)。most_common(k)操作在内部使用堆提供O(n log k)效率。

Name: Soren Atelier

更新于 2026/2/10

计算列表中元素、字符串中字符或文档中单词的出现次数是编程中最常见的任务之一。手动执行意味着编写循环、初始化字典，以及使用条件语句或.get()调用来处理缺失的键。这些样板代码掩盖了你的实际意图，引入bug，每次需要频率计数时都会拖慢你的速度。

Python的collections.Counter完全消除了这种摩擦。它用一行代码计算可哈希对象，提供内置方法来查找最常见的元素，并支持算术运算来比较频率分布。本指南涵盖了有效使用Counter所需的一切，从基本计数到高级多集运算。

什么是collections.Counter？

Counter是Python collections模块中的字典子类，专门为计数可哈希对象而设计。每个元素作为字典键存储，其计数作为对应的值存储。

from collections import Counter
 
fruits = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
count = Counter(fruits)
print(count)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1})

创建Counter

Counter接受多种输入类型，使其对不同数据源很灵活。

from collections import Counter
 
# 从列表
colors = Counter(['red', 'blue', 'red', 'green', 'blue', 'red'])
print(colors)  # Counter({'red': 3, 'blue': 2, 'green': 1})
 
# 从字符串（计算每个字符）
letters = Counter('mississippi')
print(letters)  # Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})
 
# 从字典
inventory = Counter({'apples': 15, 'oranges': 10, 'bananas': 7})
 
# 从关键字参数
stock = Counter(laptops=5, monitors=12, keyboards=30)

访问计数

基本索引

像普通字典一样访问计数，但缺失的键返回0而不是引发KeyError：

from collections import Counter
 
c = Counter(['a', 'b', 'a'])
print(c['a'])  # 2
print(c['z'])  # 0 (没有KeyError！)
 
# 与普通dict行为比较
d = {'a': 2, 'b': 1}
# d['z']  # 会引发KeyError

这种零默认值行为消除了许多计数场景中对.get(key, 0)或defaultdict(int)的需要。

获取所有元素

elements()方法返回元素的迭代器，每个元素重复其计数次：

from collections import Counter
 
c = Counter(a=3, b=2, c=1)
print(list(c.elements()))
# ['a', 'a', 'a', 'b', 'b', 'c']

most_common()方法

most_common()方法返回按频率排序的元素。

from collections import Counter
 
text = "to be or not to be that is the question"
word_freq = Counter(text.split())
 
# 所有元素按频率排序
print(word_freq.most_common())
# [('to', 2), ('be', 2), ('or', 1), ('not', 1), ...]
 
# 仅前N个元素
log_levels = Counter(['INFO', 'WARNING', 'INFO', 'ERROR', 'INFO', 'DEBUG',
                      'WARNING', 'INFO', 'ERROR', 'INFO'])
print(log_levels.most_common(2))
# [('INFO', 5), ('WARNING', 2)]
 
# 最不常见的（反向切片）
print(log_levels.most_common()[-2:])
# [('DEBUG', 1), ('ERROR', 2)]

Counter算术运算

Counter支持算术和集合运算来组合和比较频率分布。

from collections import Counter
 
morning = Counter(coffee=10, tea=5, juice=3)
afternoon = Counter(coffee=8, tea=7, water=4)
 
# 加法：组合计数
total = morning + afternoon
print(total)  # Counter({'coffee': 18, 'tea': 12, 'water': 4, 'juice': 3})
 
# 减法：只保留正计数
stock = Counter(apples=20, oranges=15, bananas=10)
sold = Counter(apples=8, oranges=15, bananas=12)
remaining = stock - sold
print(remaining)  # Counter({'apples': 12})
 
# 交集（&）：对应计数的最小值
a = Counter(apple=3, banana=2, cherry=5)
b = Counter(apple=1, banana=4, cherry=2)
print(a & b)  # Counter({'cherry': 2, 'banana': 2, 'apple': 1})
 
# 并集（|）：对应计数的最大值
print(a | b)  # Counter({'cherry': 5, 'banana': 4, 'apple': 3})
 
# 一元+：去除零和负计数
c = Counter(a=3, b=0, c=-2)
print(+c)  # Counter({'a': 3})

更新和减法

from collections import Counter
 
# update()添加计数（不像dict.update那样替换）
c = Counter(a=3, b=1)
c.update(['a', 'b', 'b', 'c'])
print(c)  # Counter({'a': 4, 'b': 3, 'c': 1})
 
# subtract()减去计数（保留零和负数）
c = Counter(a=4, b=2, c=0)
c.subtract(Counter(a=1, b=3, c=2))
print(c)  # Counter({'a': 3, 'b': -1, 'c': -2})
 
# total()返回所有计数的总和（Python 3.10+）
inventory = Counter(widgets=50, gadgets=30, gizmos=20)
print(inventory.total())  # 100

实际使用案例

词频分析

from collections import Counter
import re
 
text = """
Python is a versatile programming language. Python is used for web development,
data science, machine learning, and automation. Python's simplicity makes it
a favorite among developers.
"""
 
words = re.findall(r'\b[a-z]+\b', text.lower())
word_freq = Counter(words)
 
print("Top 5 most frequent words:")
for word, count in word_freq.most_common(5):
    print(f"  {word}: {count}")

字谜检测

from collections import Counter
 
def are_anagrams(word1, word2):
    return Counter(word1.lower()) == Counter(word2.lower())
 
print(are_anagrams("listen", "silent"))  # True
print(are_anagrams("hello", "world"))    # False

投票计数系统

from collections import Counter
 
votes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob', 'Alice',
         'Charlie', 'Alice', 'Bob', 'Alice']
 
results = Counter(votes)
winner, winning_votes = results.most_common(1)[0]
total_votes = sum(results.values())
 
print(f"Election Results (Total votes: {total_votes}):")
for candidate, count in results.most_common():
    percentage = (count / total_votes) * 100
    print(f"  {candidate}: {count} votes ({percentage:.1f}%)")
print(f"\nWinner: {winner} with {winning_votes} votes")

日志文件分析

from collections import Counter
import re
 
log_entries = [
    "2026-02-10 08:15:00 ERROR Database connection failed",
    "2026-02-10 08:15:01 INFO Retrying connection",
    "2026-02-10 08:15:02 ERROR Database connection failed",
    "2026-02-10 08:16:00 WARNING Disk usage at 85%",
    "2026-02-10 08:17:00 INFO Request processed",
    "2026-02-10 08:18:00 ERROR API timeout",
]
 
levels = Counter(
    re.search(r'(INFO|WARNING|ERROR)', line).group()
    for line in log_entries
)
print("Log Level Distribution:")
for level, count in levels.most_common():
    print(f"  {level}: {count}")

Counter vs defaultdict(int)：何时使用哪个

特性	`Counter`	`defaultdict(int)`
目的	专为计数构建	通用默认值字典
初始化	`Counter(iterable)`一步计数	需要手动循环
缺失键	返回`0`	返回`0`
`most_common()`	内置方法	需手动排序
算术（`+`、`-`）	支持	不支持
集合运算（`&`、`\|`）	支持	不支持
`elements()`	返回展开的迭代器	不可用
`update()`行为	添加到计数	替换值
性能	优化的C实现	稍慢
最适合	频率分析、多集运算	自定义默认值逻辑

当你的主要目标是计数元素或比较频率分布时使用Counter。当计数是更广泛数据结构模式的附带部分时使用defaultdict(int)。

可视化频率分布

使用Counter计数元素后，你经常想可视化分布。对于频率分布的交互式探索，PyGWalker (opens in a new tab)可以将pandas DataFrame直接在Jupyter中转换为类似Tableau的交互式UI：

from collections import Counter
import pandas as pd
import pygwalker as pyg
 
# 将Counter转换为DataFrame
data = Counter("abracadabra")
df = pd.DataFrame(data.items(), columns=['Character', 'Count'])
 
# 启动交互式可视化
walker = pyg.walk(df)

当你有大型计数器并想要交互式地过滤、排序和探索频率分布时，这特别有用。

常见问题

collections.Counter在Python中做什么？

collections.Counter是一个计算可哈希对象的字典子类。你传入任何可迭代对象，它返回一个类似字典的对象，其中键是元素，值是它们的计数。它提供most_common()等方法用于频率分析，算术运算符用于组合计数，并为缺失键返回零而不是引发KeyError。

如何使用Python Counter计算列表中项目的出现次数？

从collections导入Counter并直接传入你的列表：Counter(['a', 'b', 'a', 'c'])返回Counter({'a': 2, 'b': 1, 'c': 1})。要获取特定计数，用元素索引：counter['a']返回2。对于最频繁的项目，使用counter.most_common(n)。

Counter和普通字典在计数方面有什么区别？

Counter专为计数而构建：一次调用就能计算整个可迭代对象，为缺失键返回0而不是KeyError，有most_common()用于排序频率，支持算术运算（+、-、&、|），并包含添加到计数而不是替换的update()。普通字典需要手动循环构建，缺少这些功能。

能否减去或相加两个Counter对象？

可以。+运算符组合计数：Counter(a=3) + Counter(a=1)产生Counter({'a': 4})。-运算符减去计数并丢弃零或负结果。&运算符给出对应计数的最小值（交集），|给出最大值（并集）。要保留负计数的减法，使用subtract()方法。

Python Counter对大数据集高效吗？

是的。Counter在CPython中以优化的C代码实现，使其比手动Python循环计数更快。从可迭代对象创建Counter是O(n)。访问单个计数是O(1)。most_common(k)操作在内部使用堆，当你只需要top-k元素时提供O(n log k)效率。

总结

Python的collections.Counter是计数元素和执行频率分析的标准工具。它用单个构造函数调用替代手动字典计数，提供most_common()用于即时排名，并支持使多集比较变得简单的算术运算。无论你是分析词频、统计投票、管理库存还是分析日志文件，Counter处理计数，让你专注于真正重要的逻辑。

📚