Pandas Pivot vs Melt: Reshape Data the Right Way
Updated on
Reshaping data is often the bottleneck before analysis or visualization. Picking the wrong method can duplicate rows, drop data, or produce unusable wide tables.
PAS framing:
- Problem: Messy wide/long data slows down plotting and aggregation.
- Agitate: Using
pivotwhen you need aggregation raises errors; usingmeltincorrectly multiplies rows or loses identifiers. - Solution: Follow a tidy-data workflow—
pivot_tablewhen aggregation is required,pivotwhen keys are unique,meltto go long, andstack/unstackfor multi-index control.
Quick Reshape Guide
| Method | Direction | When to use | Handles duplicates? |
|---|---|---|---|
pivot | long → wide | Keys are unique per index/column pair | 🚫 (raises if duplicates) |
pivot_table | long → wide | Need aggregation or duplicates exist | ✅ via aggfunc |
melt | wide → long | Normalize columns into rows | N/A |
stack / unstack | wide ↔ long on index levels | MultiIndex reshaping | N/A |
Sample Data
import pandas as pd
sales = pd.DataFrame({
"region": ["NA", "NA", "EU", "EU"],
"quarter": ["Q1", "Q2", "Q1", "Q2"],
"product": ["A", "A", "A", "A"],
"revenue": [120, 140, 110, 150],
"units": [10, 11, 9, 12],
})Wide Output with pivot
wide = sales.pivot(index="region", columns="quarter", values="revenue")- Requires unique combinations of
region+quarter. - Use
.reset_index()to flatten after pivoting if needed.
Aggregated Wide Output with pivot_table
table = sales.pivot_table(
index="region",
columns="quarter",
values="revenue",
aggfunc="sum",
margins=True, # totals row/column
margins_name="Total"
)- Handles duplicate keys by aggregating.
fill_value=0can replace missing cells for clean exports.
Multiple Values and Aggregations
multi = sales.pivot_table(
index="region",
columns="quarter",
values=["revenue", "units"],
aggfunc={"revenue": "sum", "units": "mean"},
)Tidy column names afterward:
multi.columns = [f"{metric}_{col}" for metric, col in multi.columns]
multi = multi.reset_index()Long Format with melt
long = sales.melt(
id_vars=["region", "quarter"],
value_vars=["revenue", "units"],
var_name="metric",
value_name="value",
)- Ideal for plotting libraries that expect tidy data (one column per variable, one row per observation).
- Use
value_varsto control which columns to unpivot.
Stack and Unstack for MultiIndex
stacked = table.stack() # columns → rows
unstacked = stacked.unstack() # rows → columns- Great for rearranging levels without redefining keys.
- Control level with
stack(level="quarter")orunstack(level=-1).
Common Pitfalls
| Pitfall | Fix |
|---|---|
ValueError: Index contains duplicate entries with pivot | Switch to pivot_table with an aggregation. |
| Column order unexpected | Sort columns: table = table.reindex(sorted(table.columns), axis=1). |
| Multi-level columns hard to read | Flatten after pivot_table with a list comprehension. |
| Missing values after reshape | Use fill_value or stack(dropna=False) to preserve blanks. |
Workflow Tips
- Normalize first:
melt→groupby/agg→pivot_tablefor presentation. - For charts, keep data long; wide is best for reporting/exporting.
- After
pivot_table, export to Excel with formatting:table.to_excel("report.xlsx").
Related Guides
Key Takeaways
- Use
pivotfor unique keys,pivot_tablefor aggregation,meltto go long. - Flatten MultiIndex columns after complex pivots to keep exports tidy.
- Combine reshape steps with
groupbyto create analysis-ready datasets fast.