Pandas 滚动窗口：Rolling、Expanding 和 EWM

Name: Rajiv Chandra

更新于 2025/11/30

移动平均和信号平滑是时间序列分析的核心，但很多团队在窗口对齐、缺失值以及缓慢的 Python 循环上频频踩坑。

问题： 手动计算滚动统计既容易出错，又经常与时间戳错位。
痛点： 循环和随手写的 shift 容易造成 off-by-one 错误、前几期的空缺，以及卡顿的 Notebook。
解决方案： 使用 rolling、expanding 和 ewm，结合合适的窗口定义（min_periods、基于时间的窗口、center、adjust），即可获得正确、快速、向量化的结果。

Want an AI agent that understands your pandas notebooks and rolling-window features?

RunCell is a JupyterLab AI agent that can read your code, analyze DataFrames, understand notebook context, debug errors, and even generate & execute code for you. It works directly inside JupyterLab—no switching windows or copy-pasting.

👉 Try RunCell: runcell.dev (opens in a new tab)

速查表

窗口类型	适用场景	关键参数
`rolling`	移动平均、波动率、自定义窗口函数	`window=3`（或 `"7D"`）、`min_periods`、`center`、`win_type`、`on`
`expanding`	从序列开始的累积统计	`min_periods`
`ewm`	指数衰减平滑或加权指标	`span`、`alpha`、`halflife`、`adjust`、`times`

示例数据

import pandas as pd
 
dates = pd.date_range("2024-01-01", periods=8, freq="D")
sales = pd.DataFrame({"date": dates, "revenue": [10, 12, 9, 14, 15, 13, 11, 16]})
sales = sales.set_index("date")

滚动窗口（固定窗口与时间窗口）

固定大小窗口

sales["rev_ma3"] = (
    sales["revenue"]
    .rolling(window=3, min_periods=2)
    .mean()
)

min_periods 控制结果何时开始输出；在达到最小观测数之前，前几行保持为 NaN。
center=True 会将统计值对齐到窗口中间位置（在绘图时非常实用）。

基于 datetime 索引或 `on=` 列的时间窗口

sales_reset = sales.reset_index()
sales_reset["rev_7d_mean"] = (
    sales_reset.rolling("7D", on="date")["revenue"].mean()
)

对于不规则采样，使用持续时间字符串（"7D"、"48H"）；pandas 会选择回溯窗口时间范围内的行，而不是固定行数。
在控制窗口开闭区间时，可根据需要调整 closed="left" 或 "right"。

自定义窗口函数

sales["rev_range"] = (
    sales["revenue"].rolling(4).apply(lambda x: x.max() - x.min(), raw=True)
)

设置 raw=True 以在 apply 内使用 NumPy 数组，从而获得更高性能。

Expanding 窗口（累积窗口）

sales["rev_cum_mean"] = sales["revenue"].expanding(min_periods=2).mean()

当每个观测值都应该看到“从起点到当前”的全部历史（运行均值、累积比率）时，使用 expanding。
可结合 shift() 对比“最新值”与“历史平均”。

指数加权窗口（EWM）

sales["rev_ewm_span4"] = sales["revenue"].ewm(span=4, adjust=False).mean()

adjust=False 使用递推公式，行为与许多分析看板中的典型平滑方式一致。
halflife 提供直观的衰减控制：ewm(halflife=3) 表示每 3 个周期权重减半。
对于不规则时间戳，可传入 times="date"（或设置为索引），让权重按真实时间间隔而不是行数递减。

如何选择合适的窗口（实用对照表）

目标	推荐方法	说明
平滑短期噪声	使用较小 `window` 的 `rolling` 并设 `center=True`	适用于数值列；将 `min_periods` 设为 ≥ 1 以便尽早看到结果
从序列开始的运行总和或平均	`expanding`	没有固定窗口；适合累积类 KPI
让旧观测“逐渐遗忘”	`ewm(span=...)`	比超大滚动窗口更适合动量类信号
不规则时间戳	使用基于时间的 `rolling("7D", on="date")` 或 `ewm(..., times="date")`	避免因采样更密集的日期产生偏差
特征构造	`rolling().agg(["mean","std","min","max"])`	多重聚合可以快速构造整洁的特征集

性能与正确性建议

基于时间窗口时，保持时间列为 datetime64[ns] 并优先设为索引。
优先使用内置聚合函数（mean、std、sum、count），避免 Python 级 apply 以获得更好性能。
避免“前视偏差”：如果在构建监督学习特征，先 shift() 再 rolling。
如果源数据频率不规则，可先使用 resample 归一化频率，再进行 rolling。

rolling、expanding 和指数加权窗口基本覆盖了绝大多数平滑与特征工程需求，而且无需编写循环。结合 pandas-to-datetime 和 pandas-resample 清理时间轴，你就能为图表和模型提供快速可靠的指标。

Pandas 数据透视表：像 Excel 一样汇总并重塑数据（指南）Pandas 类型标注：高效且易维护代码的最佳实践