Pandas Pivot Table: Summarize and Reshape Data Like Excel (Guide)
Updated on
Every analyst who has worked with Excel knows pivot tables. You drag a field to the rows area, another to the columns area, pick a summary function, and a dense spreadsheet of raw transactions turns into a clean summary showing totals by category, averages by region, or counts by month. Now you need to do the same thing in Python, but writing nested groupby calls with unstack feels clumsy and hard to read.
The real frustration hits when your data has multiple grouping levels, you need subtotals, or you want several aggregation functions applied to different columns at the same time. Chaining groupby, agg, and unstack together produces fragile code that breaks whenever the data changes shape.
The pandas pivot_table function brings the full power of Excel-style pivot tables into Python with a clean, declarative API. One function call handles grouping, aggregation, multi-level indexing, subtotals, and missing value handling. This guide covers every parameter, walks through practical examples, and compares pivot_table with groupby, pivot, and crosstab so you know exactly which tool to reach for.
What pd.pivot_table() Does
pd.pivot_table() creates a spreadsheet-style summary table from a DataFrame. You specify which columns to use as row labels (index), which to use as column headers, which values to aggregate, and what aggregation function to apply. The result is a new DataFrame where each cell contains a summary statistic.
Here is a simple before-and-after:
Raw data:
| region | product | revenue |
|---|---|---|
| North | Widget | 1200 |
| North | Gadget | 800 |
| South | Widget | 1500 |
| South | Gadget | 950 |
| North | Widget | 1400 |
| South | Widget | 1600 |
Pivot table (sum of revenue by region and product):
| region | Gadget | Widget |
|---|---|---|
| North | 800 | 2600 |
| South | 950 | 3100 |
Each cell contains the sum of revenue for that region-product combination.
pd.pivot_table() Syntax and Parameters
pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
fill_value=None, margins=False, dropna=True, margins_name='All',
observed=False, sort=True)Parameter Reference
| Parameter | Description | Default |
|---|---|---|
data | The DataFrame to summarize | Required |
values | Column(s) to aggregate | None (all numeric columns) |
index | Column(s) to use as row labels | None |
columns | Column(s) to use as column headers | None |
aggfunc | Aggregation function(s): 'mean', 'sum', 'count', 'min', 'max', or a function/dict/list | 'mean' |
fill_value | Value to replace NaN in the result (see also fillna()) | None |
margins | Add row/column totals (subtotals) | False |
margins_name | Name for the margins row/column | 'All' |
dropna | Exclude columns whose entries are all NaN | True |
observed | Only show observed categories for categorical columns | False |
sort | Sort the result | True |
Sample Data for All Examples
Every example below uses this sales dataset:
import pandas as pd
import numpy as np
sales = pd.DataFrame({
'date': pd.to_datetime(['2025-01-15', '2025-01-20', '2025-02-10', '2025-02-18',
'2025-01-12', '2025-02-22', '2025-01-25', '2025-02-14',
'2025-01-30', '2025-02-05', '2025-01-18', '2025-02-28']),
'region': ['North', 'North', 'North', 'North', 'South', 'South',
'South', 'South', 'East', 'East', 'East', 'East'],
'product': ['Widget', 'Gadget', 'Widget', 'Gadget',
'Widget', 'Gadget', 'Widget', 'Gadget',
'Widget', 'Gadget', 'Widget', 'Gadget'],
'salesperson': ['Alice', 'Alice', 'Bob', 'Bob',
'Charlie', 'Charlie', 'Diana', 'Diana',
'Eve', 'Eve', 'Frank', 'Frank'],
'revenue': [1200, 800, 1400, 850, 1500, 950, 1100, 780, 1300, 900, 1250, 870],
'units': [10, 8, 12, 9, 15, 10, 11, 8, 13, 9, 12, 9]
})
print(sales)Basic Pivot Table: Sum of Revenue by Region
table = pd.pivot_table(sales, values='revenue', index='region', aggfunc='sum')
print(table)Output:
revenue
region
East 4320
North 4250
South 4330This is equivalent to sales.groupby('region')['revenue'].sum(), but returns a DataFrame instead of a Series.
Adding Column Headers
Add the columns parameter to break down the summary by a second variable:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum')
print(table)Output:
product Gadget Widget
region
East 1770 2550
North 1650 2600
South 1730 2600Now each cell shows the total revenue for a specific region-product pair.
Multiple Aggregation Functions
Pass a list of functions to aggfunc to compute multiple statistics at once:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc=['sum', 'mean', 'count'])
print(table)Output:
sum mean count
product Gadget Widget Gadget Widget Gadget Widget
region
East 1770 2550 885.0 1275.0 2 2
North 1650 2600 825.0 1300.0 2 2
South 1730 2600 865.0 1300.0 2 2The result has a MultiIndex on the columns, with the aggregation function at the top level and the product at the second level.
Different Aggregation Functions per Column
Use a dictionary for aggfunc to apply different functions to different value columns:
table = pd.pivot_table(sales, values=['revenue', 'units'], index='region',
columns='product',
aggfunc={'revenue': 'sum', 'units': 'mean'})
print(table)Output:
revenue units
product Gadget Widget Gadget Widget
region
East 1770 2550 9.0 12.5
North 1650 2600 8.5 11.0
South 1730 2600 9.0 13.0Revenue is summed while units are averaged -- exactly what you would do in an Excel pivot table with different summary functions per field.
Adding Subtotals with margins
The margins parameter adds a row and column showing the grand totals:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum', margins=True)
print(table)Output:
product Gadget Widget All
region
East 1770 2550 4320
North 1650 2600 4250
South 1730 2600 4330
All 5150 7750 12900The All row shows total revenue per product. The All column shows total revenue per region. The bottom-right cell is the grand total.
You can customize the label with margins_name:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum',
margins=True, margins_name='Total')
print(table)Handling Missing Values with fill_value
When certain combinations do not exist in your data, the pivot table contains NaN. Use fill_value to replace them:
# Remove one row to create a missing combination
sales_missing = sales.drop(index=0)
table = pd.pivot_table(sales_missing, values='revenue', index='region',
columns='product', aggfunc='sum', fill_value=0)
print(table)This replaces NaN with 0, which is cleaner for display and avoids issues in downstream calculations.
Multi-Level Index (Row Grouping)
Pass a list of columns to index to create hierarchical row labels:
table = pd.pivot_table(sales, values='revenue',
index=['region', 'salesperson'],
columns='product', aggfunc='sum')
print(table)Output:
product Gadget Widget
region salesperson
East Eve 900.0 1300.0
Frank 870.0 1250.0
North Alice 800.0 1200.0
Bob 850.0 1400.0
South Charlie 950.0 1500.0
Diana 780.0 1100.0Each salesperson is nested under their region, giving a drill-down view of the data.
Multi-Level Columns
Similarly, pass a list to columns for hierarchical column headers:
sales['month'] = sales['date'].dt.month_name()
table = pd.pivot_table(sales, values='revenue', index='region',
columns=['product', 'month'], aggfunc='sum', fill_value=0)
print(table)This creates a two-level column header with product at the top and month at the second level.
Using Custom Aggregation Functions
You can pass any callable to aggfunc, including lambda functions and NumPy functions:
# Range (max - min) of revenue by region
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc=lambda x: x.max() - x.min())
print(table)Output:
product Gadget Widget
region
East 30 50
North 50 200
South 170 400Other useful custom aggregations:
# Coefficient of variation
table = pd.pivot_table(sales, values='revenue', index='region',
aggfunc=lambda x: x.std() / x.mean() * 100)Real-World Example: Student Grade Analysis
students = pd.DataFrame({
'student': ['Alice', 'Alice', 'Alice', 'Bob', 'Bob', 'Bob',
'Charlie', 'Charlie', 'Charlie', 'Diana', 'Diana', 'Diana'],
'subject': ['Math', 'Science', 'English'] * 4,
'semester': ['Fall', 'Fall', 'Fall', 'Fall', 'Fall', 'Fall',
'Spring', 'Spring', 'Spring', 'Spring', 'Spring', 'Spring'],
'score': [92, 88, 95, 78, 85, 72, 90, 93, 88, 85, 79, 91]
})
# Average score by subject and semester
table = pd.pivot_table(students, values='score', index='subject',
columns='semester', aggfunc='mean', margins=True)
print(table.round(1))Output:
semester Fall Spring All
subject
English 83.5 89.5 86.5
Math 85.0 87.5 86.2
Science 86.5 86.0 86.2
All 85.0 87.7 86.3This immediately shows that Spring semester averages are slightly higher overall, with English showing the biggest improvement.
Real-World Example: Monthly Sales Report
# Create a monthly sales summary
sales['month'] = sales['date'].dt.strftime('%Y-%m')
report = pd.pivot_table(sales, values=['revenue', 'units'],
index='region', columns='month',
aggfunc={'revenue': 'sum', 'units': 'sum'},
margins=True, fill_value=0)
print(report)This produces the same kind of monthly summary report you would create in Excel, with totals for each region and month.
pivot_table vs pivot vs groupby vs crosstab
These four functions overlap in capability but each has a distinct use case:
| Feature | pivot_table() | pivot() | groupby() | crosstab() |
|---|---|---|---|---|
| Aggregation | Yes (any function) | No (raises on duplicates) | Yes (any function) | Yes (limited) |
| Handles duplicates | Yes, via aggfunc | No | Yes | Yes |
| Subtotals (margins) | Yes | No | No (manual) | Yes |
| Fill missing values | Yes (fill_value) | No | No | Yes (fill_value) |
| Input | DataFrame | DataFrame | DataFrame | Series/arrays |
| Output | DataFrame | DataFrame | DataFrame/Series | DataFrame |
| Multi-level index | Yes | Yes | Yes | Yes |
| Best for | Summarizing data with aggregation | Reshaping unique key-value data | Flexible grouped analysis | Frequency tables |
| Default function | mean | N/A | N/A | count |
When to Use Each
Use pivot_table() when you need Excel-style summaries with aggregation, subtotals, or multiple aggregation functions. It is the most powerful and readable option for creating summary tables.
Use pivot() when your data has unique combinations of index and column values and you just want to reshape without aggregation. To do the reverse -- turning columns into rows -- use melt(). pivot() is faster than pivot_table() because it skips the aggregation step.
Use groupby() when you need grouped calculations but do not need a wide-format result. groupby returns long-format output by default. You can achieve pivot_table-like output with groupby().unstack(), but pivot_table is more readable.
Use crosstab() when you are computing frequency tables or cross-tabulations of categorical variables. crosstab() accepts Series or arrays directly (not just DataFrames) and defaults to counting.
Equivalence Example
These three lines produce the same result:
# pivot_table approach
result1 = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum')
# groupby + unstack approach
result2 = sales.groupby(['region', 'product'])['revenue'].sum().unstack()
# Both produce the same table
print(result1.equals(result2)) # TrueThe pivot_table version is more readable, especially when you add margins, fill values, or multiple aggregation functions.
Flattening MultiIndex Columns
After creating a pivot table with multiple aggregation functions, you often get MultiIndex columns that are hard to work with:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc=['sum', 'mean'])
# Flatten the column MultiIndex
table.columns = ['_'.join(col).strip() for col in table.columns.values]
print(table)Output:
sum_Gadget sum_Widget mean_Gadget mean_Widget
region
East 1770 2550 885.0 1275.0
North 1650 2600 825.0 1300.0
South 1730 2600 865.0 1300.0Now the columns are flat strings that are easier to reference.
Sorting and Filtering Pivot Tables
Pivot tables are regular DataFrames, so you can sort and filter them:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum', margins=True)
# Sort by total revenue (All column), descending
sorted_table = table.sort_values('All', ascending=False)
print(sorted_table)
# Filter to show only regions with Widget revenue > 2500
filtered = table[table['Widget'] > 2500]
print(filtered)Exporting Pivot Tables
Save your pivot table to Excel (where stakeholders expect it) or CSV:
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum', margins=True)
# Export to Excel
table.to_excel('sales_pivot.xlsx', sheet_name='Revenue Summary')
# Export to CSV
table.to_csv('sales_pivot.csv')Performance Tips
pivot_table() calls groupby internally, so performance is similar. For large datasets:
| Dataset Size | Expected Time |
|---|---|
| 100K rows, 2 group cols | ~5 ms |
| 1M rows, 2 group cols | ~50 ms |
| 10M rows, 3 group cols | ~500 ms |
Optimization strategies:
- Reduce data before pivoting -- filter rows and select only needed columns before calling
pivot_table(). - Use categorical dtypes -- convert string columns to
categorydtype for faster grouping. - Avoid lambda aggfuncs -- built-in string names (
'sum','mean') use optimized C code. Lambda functions fall back to slow Python loops.
# Faster: use categorical dtypes
sales['region'] = sales['region'].astype('category')
sales['product'] = sales['product'].astype('category')
# Faster: use string name instead of lambda
table = pd.pivot_table(sales, values='revenue', index='region',
columns='product', aggfunc='sum') # 'sum' is optimizedVisualize Pivot Tables with PyGWalker
While pd.pivot_table() is great for numerical summaries, sometimes you need to visualize the patterns in your data interactively. PyGWalker (opens in a new tab) is an open-source Python library that lets you create pivot tables, bar charts, heatmaps, and more through a visual drag-and-drop interface -- no code required after the initial setup.
import pandas as pd
import pygwalker as pyg
sales = pd.DataFrame({
'region': ['North', 'North', 'South', 'South', 'East', 'East'] * 2,
'product': ['Widget', 'Gadget'] * 6,
'revenue': [1200, 800, 1500, 950, 1300, 900, 1400, 850, 1100, 780, 1250, 870],
'units': [10, 8, 15, 10, 13, 9, 12, 9, 11, 8, 12, 9]
})
# Launch interactive pivot table and visualization
walker = pyg.walk(sales)PyGWalker gives you a Tableau-like interface where you can drag region to rows, product to columns, and revenue to values to create a pivot table visually. You can switch between table view and chart view instantly, try different aggregation functions, and export the result -- all without writing additional code.
Try PyGWalker in Google Colab (opens in a new tab), Kaggle (opens in a new tab), or install with
pip install pygwalker.
FAQ
What is the difference between pivot and pivot_table in pandas?
pivot() reshapes data without aggregation -- it requires unique combinations of index and column values and raises an error if duplicates exist. pivot_table() handles duplicates by aggregating them using a function like sum or mean. Use pivot() for simple reshaping and pivot_table() when you need aggregation or subtotals.
How do I add totals (margins) to a pandas pivot table?
Set margins=True in the pivot_table() call: pd.pivot_table(df, values='revenue', index='region', columns='product', aggfunc='sum', margins=True). This adds an All row and column with subtotals. Customize the label with margins_name='Total'.
Can I use multiple aggregation functions in a pivot table?
Yes. Pass a list to aggfunc: aggfunc=['sum', 'mean', 'count']. This creates a MultiIndex column header with one level for the function and one for the value columns. You can also pass a dictionary to apply different functions to different columns: aggfunc={'revenue': 'sum', 'units': 'mean'}.
How is pivot_table different from groupby in pandas?
Both aggregate data, but pivot_table() produces a wide-format result (with column headers from one of the grouping variables), while groupby() produces a long-format result by default. pivot_table() also supports margins (subtotals) and fill values natively. Internally, pivot_table() uses groupby().
How do I handle NaN values in a pivot table?
Use the fill_value parameter to replace NaN with a specific value: pd.pivot_table(df, ..., fill_value=0). The dropna=True parameter (default) excludes columns where all entries are NaN.
Can I create a pivot table with percentage values?
Yes. Create the pivot table with counts or sums first, then divide by the total. You can also use normalize in pd.crosstab() for percentage-based cross-tabulations. For pivot tables, calculate percentages manually: table = table.div(table.sum(axis=1), axis=0) * 100.
Conclusion
The pandas pivot_table() function is the most versatile tool for creating summary tables in Python. Here are the key points to remember:
- Use
values,index, andcolumnsto define what to summarize, how to group rows, and how to group columns. - Use
aggfuncto specify the aggregation function. Pass a list for multiple functions or a dictionary for per-column functions. - Use
margins=Trueto add subtotals -- the equivalent of "Grand Total" in Excel. - Use
fill_valueto replace missing combinations with a default value (typically 0). - Prefer
pivot_table()overgroupby().unstack()for readability, especially when you need margins or fill values. - Prefer
pivot()overpivot_table()when your data has unique key combinations and you do not need aggregation. - Prefer
crosstab()for simple frequency tables of categorical variables.
For interactive exploration of your pivot tables, PyGWalker (opens in a new tab) provides a visual, drag-and-drop interface that mirrors the Excel pivot table experience inside Jupyter Notebook.
Related Guides
- Pandas Melt: Reshape Wide Data to Long Format
- Pandas GroupBy: Aggregation, Transform, Apply
- Pandas fillna(): Handle Missing Values
- Pandas sort_values(): Sort DataFrames
- Pandas Crosstab