Skip to content

Pandas Pivot vs Melt: Reshape Data the Right Way

Updated on

Reshaping data is often the bottleneck before analysis or visualization. Picking the wrong method can duplicate rows, drop data, or produce unusable wide tables.

PAS framing:

  • Problem: Messy wide/long data slows down plotting and aggregation.
  • Agitate: Using pivot when you need aggregation raises errors; using melt incorrectly multiplies rows or loses identifiers.
  • Solution: Follow a tidy-data workflow—pivot_table when aggregation is required, pivot when keys are unique, melt to go long, and stack/unstack for multi-index control.

Quick Reshape Guide

MethodDirectionWhen to useHandles duplicates?
pivotlong → wideKeys are unique per index/column pair🚫 (raises if duplicates)
pivot_tablelong → wideNeed aggregation or duplicates exist✅ via aggfunc
meltwide → longNormalize columns into rowsN/A
stack / unstackwide ↔ long on index levelsMultiIndex reshapingN/A

Sample Data

import pandas as pd
 
sales = pd.DataFrame({
    "region": ["NA", "NA", "EU", "EU"],
    "quarter": ["Q1", "Q2", "Q1", "Q2"],
    "product": ["A", "A", "A", "A"],
    "revenue": [120, 140, 110, 150],
    "units": [10, 11, 9, 12],
})

Wide Output with pivot

wide = sales.pivot(index="region", columns="quarter", values="revenue")
  • Requires unique combinations of region + quarter.
  • Use .reset_index() to flatten after pivoting if needed.

Aggregated Wide Output with pivot_table

table = sales.pivot_table(
    index="region",
    columns="quarter",
    values="revenue",
    aggfunc="sum",
    margins=True,       # totals row/column
    margins_name="Total"
)
  • Handles duplicate keys by aggregating.
  • fill_value=0 can replace missing cells for clean exports.

Multiple Values and Aggregations

multi = sales.pivot_table(
    index="region",
    columns="quarter",
    values=["revenue", "units"],
    aggfunc={"revenue": "sum", "units": "mean"},
)

Tidy column names afterward:

multi.columns = [f"{metric}_{col}" for metric, col in multi.columns]
multi = multi.reset_index()

Long Format with melt

long = sales.melt(
    id_vars=["region", "quarter"],
    value_vars=["revenue", "units"],
    var_name="metric",
    value_name="value",
)
  • Ideal for plotting libraries that expect tidy data (one column per variable, one row per observation).
  • Use value_vars to control which columns to unpivot.

Stack and Unstack for MultiIndex

stacked = table.stack()           # columns → rows
unstacked = stacked.unstack()     # rows → columns
  • Great for rearranging levels without redefining keys.
  • Control level with stack(level="quarter") or unstack(level=-1).

Common Pitfalls

PitfallFix
ValueError: Index contains duplicate entries with pivotSwitch to pivot_table with an aggregation.
Column order unexpectedSort columns: table = table.reindex(sorted(table.columns), axis=1).
Multi-level columns hard to readFlatten after pivot_table with a list comprehension.
Missing values after reshapeUse fill_value or stack(dropna=False) to preserve blanks.

Workflow Tips

  • Normalize first: meltgroupby/aggpivot_table for presentation.
  • For charts, keep data long; wide is best for reporting/exporting.
  • After pivot_table, export to Excel with formatting: table.to_excel("report.xlsx").

Related Guides


Key Takeaways

  • Use pivot for unique keys, pivot_table for aggregation, melt to go long.
  • Flatten MultiIndex columns after complex pivots to keep exports tidy.
  • Combine reshape steps with groupby to create analysis-ready datasets fast.