Skip to content

Pandas Pivot Table: Summarize and Reshape Data Like Excel (Guide)

Updated on

Every analyst who has worked with Excel knows pivot tables. You drag a field to the rows area, another to the columns area, pick a summary function, and a dense spreadsheet of raw transactions turns into a clean summary showing totals by category, averages by region, or counts by month. Now you need to do the same thing in Python, but writing nested groupby calls with unstack feels clumsy and hard to read.

The real frustration hits when your data has multiple grouping levels, you need subtotals, or you want several aggregation functions applied to different columns at the same time. Chaining groupby, agg, and unstack together produces fragile code that breaks whenever the data changes shape.

The pandas pivot_table function brings the full power of Excel-style pivot tables into Python with a clean, declarative API. One function call handles grouping, aggregation, multi-level indexing, subtotals, and missing value handling. This guide covers every parameter, walks through practical examples, and compares pivot_table with groupby, pivot, and crosstab so you know exactly which tool to reach for.

📚

What pd.pivot_table() Does

pd.pivot_table() creates a spreadsheet-style summary table from a DataFrame. You specify which columns to use as row labels (index), which to use as column headers, which values to aggregate, and what aggregation function to apply. The result is a new DataFrame where each cell contains a summary statistic.

Here is a simple before-and-after:

Raw data:

regionproductrevenue
NorthWidget1200
NorthGadget800
SouthWidget1500
SouthGadget950
NorthWidget1400
SouthWidget1600

Pivot table (sum of revenue by region and product):

regionGadgetWidget
North8002600
South9503100

Each cell contains the sum of revenue for that region-product combination.

pd.pivot_table() Syntax and Parameters

pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
               fill_value=None, margins=False, dropna=True, margins_name='All',
               observed=False, sort=True)

Parameter Reference

ParameterDescriptionDefault
dataThe DataFrame to summarizeRequired
valuesColumn(s) to aggregateNone (all numeric columns)
indexColumn(s) to use as row labelsNone
columnsColumn(s) to use as column headersNone
aggfuncAggregation function(s): 'mean', 'sum', 'count', 'min', 'max', or a function/dict/list'mean'
fill_valueValue to replace NaN in the result (see also fillna())None
marginsAdd row/column totals (subtotals)False
margins_nameName for the margins row/column'All'
dropnaExclude columns whose entries are all NaNTrue
observedOnly show observed categories for categorical columnsFalse
sortSort the resultTrue

Sample Data for All Examples

Every example below uses this sales dataset:

import pandas as pd
import numpy as np
 
sales = pd.DataFrame({
    'date': pd.to_datetime(['2025-01-15', '2025-01-20', '2025-02-10', '2025-02-18',
                            '2025-01-12', '2025-02-22', '2025-01-25', '2025-02-14',
                            '2025-01-30', '2025-02-05', '2025-01-18', '2025-02-28']),
    'region': ['North', 'North', 'North', 'North', 'South', 'South',
               'South', 'South', 'East', 'East', 'East', 'East'],
    'product': ['Widget', 'Gadget', 'Widget', 'Gadget',
                'Widget', 'Gadget', 'Widget', 'Gadget',
                'Widget', 'Gadget', 'Widget', 'Gadget'],
    'salesperson': ['Alice', 'Alice', 'Bob', 'Bob',
                    'Charlie', 'Charlie', 'Diana', 'Diana',
                    'Eve', 'Eve', 'Frank', 'Frank'],
    'revenue': [1200, 800, 1400, 850, 1500, 950, 1100, 780, 1300, 900, 1250, 870],
    'units': [10, 8, 12, 9, 15, 10, 11, 8, 13, 9, 12, 9]
})
 
print(sales)

Basic Pivot Table: Sum of Revenue by Region

table = pd.pivot_table(sales, values='revenue', index='region', aggfunc='sum')
print(table)

Output:

        revenue
region
East       4320
North      4250
South      4330

This is equivalent to sales.groupby('region')['revenue'].sum(), but returns a DataFrame instead of a Series.

Adding Column Headers

Add the columns parameter to break down the summary by a second variable:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc='sum')
print(table)

Output:

product  Gadget  Widget
region
East       1770    2550
North      1650    2600
South      1730    2600

Now each cell shows the total revenue for a specific region-product pair.

Multiple Aggregation Functions

Pass a list of functions to aggfunc to compute multiple statistics at once:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc=['sum', 'mean', 'count'])
print(table)

Output:

          sum          mean                count
product Gadget Widget Gadget  Widget      Gadget Widget
region
East      1770   2550  885.0  1275.0           2      2
North     1650   2600  825.0  1300.0           2      2
South     1730   2600  865.0  1300.0           2      2

The result has a MultiIndex on the columns, with the aggregation function at the top level and the product at the second level.

Different Aggregation Functions per Column

Use a dictionary for aggfunc to apply different functions to different value columns:

table = pd.pivot_table(sales, values=['revenue', 'units'], index='region',
                       columns='product',
                       aggfunc={'revenue': 'sum', 'units': 'mean'})
print(table)

Output:

       revenue         units
product Gadget Widget Gadget Widget
region
East      1770   2550    9.0   12.5
North     1650   2600    8.5   11.0
South     1730   2600    9.0   13.0

Revenue is summed while units are averaged -- exactly what you would do in an Excel pivot table with different summary functions per field.

Adding Subtotals with margins

The margins parameter adds a row and column showing the grand totals:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc='sum', margins=True)
print(table)

Output:

product  Gadget  Widget    All
region
East       1770    2550   4320
North      1650    2600   4250
South      1730    2600   4330
All        5150    7750  12900

The All row shows total revenue per product. The All column shows total revenue per region. The bottom-right cell is the grand total.

You can customize the label with margins_name:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc='sum',
                       margins=True, margins_name='Total')
print(table)

Handling Missing Values with fill_value

When certain combinations do not exist in your data, the pivot table contains NaN. Use fill_value to replace them:

# Remove one row to create a missing combination
sales_missing = sales.drop(index=0)
 
table = pd.pivot_table(sales_missing, values='revenue', index='region',
                       columns='product', aggfunc='sum', fill_value=0)
print(table)

This replaces NaN with 0, which is cleaner for display and avoids issues in downstream calculations.

Multi-Level Index (Row Grouping)

Pass a list of columns to index to create hierarchical row labels:

table = pd.pivot_table(sales, values='revenue',
                       index=['region', 'salesperson'],
                       columns='product', aggfunc='sum')
print(table)

Output:

product                Gadget  Widget
region salesperson
East   Eve              900.0  1300.0
       Frank            870.0  1250.0
North  Alice            800.0  1200.0
       Bob              850.0  1400.0
South  Charlie          950.0  1500.0
       Diana            780.0  1100.0

Each salesperson is nested under their region, giving a drill-down view of the data.

Multi-Level Columns

Similarly, pass a list to columns for hierarchical column headers:

sales['month'] = sales['date'].dt.month_name()
 
table = pd.pivot_table(sales, values='revenue', index='region',
                       columns=['product', 'month'], aggfunc='sum', fill_value=0)
print(table)

This creates a two-level column header with product at the top and month at the second level.

Using Custom Aggregation Functions

You can pass any callable to aggfunc, including lambda functions and NumPy functions:

# Range (max - min) of revenue by region
table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc=lambda x: x.max() - x.min())
print(table)

Output:

product  Gadget  Widget
region
East         30      50
North        50     200
South       170     400

Other useful custom aggregations:

# Coefficient of variation
table = pd.pivot_table(sales, values='revenue', index='region',
                       aggfunc=lambda x: x.std() / x.mean() * 100)

Real-World Example: Student Grade Analysis

students = pd.DataFrame({
    'student': ['Alice', 'Alice', 'Alice', 'Bob', 'Bob', 'Bob',
                'Charlie', 'Charlie', 'Charlie', 'Diana', 'Diana', 'Diana'],
    'subject': ['Math', 'Science', 'English'] * 4,
    'semester': ['Fall', 'Fall', 'Fall', 'Fall', 'Fall', 'Fall',
                 'Spring', 'Spring', 'Spring', 'Spring', 'Spring', 'Spring'],
    'score': [92, 88, 95, 78, 85, 72, 90, 93, 88, 85, 79, 91]
})
 
# Average score by subject and semester
table = pd.pivot_table(students, values='score', index='subject',
                       columns='semester', aggfunc='mean', margins=True)
print(table.round(1))

Output:

semester   Fall  Spring   All
subject
English    83.5    89.5  86.5
Math       85.0    87.5  86.2
Science    86.5    86.0  86.2
All        85.0    87.7  86.3

This immediately shows that Spring semester averages are slightly higher overall, with English showing the biggest improvement.

Real-World Example: Monthly Sales Report

# Create a monthly sales summary
sales['month'] = sales['date'].dt.strftime('%Y-%m')
 
report = pd.pivot_table(sales, values=['revenue', 'units'],
                        index='region', columns='month',
                        aggfunc={'revenue': 'sum', 'units': 'sum'},
                        margins=True, fill_value=0)
print(report)

This produces the same kind of monthly summary report you would create in Excel, with totals for each region and month.

pivot_table vs pivot vs groupby vs crosstab

These four functions overlap in capability but each has a distinct use case:

Featurepivot_table()pivot()groupby()crosstab()
AggregationYes (any function)No (raises on duplicates)Yes (any function)Yes (limited)
Handles duplicatesYes, via aggfuncNoYesYes
Subtotals (margins)YesNoNo (manual)Yes
Fill missing valuesYes (fill_value)NoNoYes (fill_value)
InputDataFrameDataFrameDataFrameSeries/arrays
OutputDataFrameDataFrameDataFrame/SeriesDataFrame
Multi-level indexYesYesYesYes
Best forSummarizing data with aggregationReshaping unique key-value dataFlexible grouped analysisFrequency tables
Default functionmeanN/AN/Acount

When to Use Each

Use pivot_table() when you need Excel-style summaries with aggregation, subtotals, or multiple aggregation functions. It is the most powerful and readable option for creating summary tables.

Use pivot() when your data has unique combinations of index and column values and you just want to reshape without aggregation. To do the reverse -- turning columns into rows -- use melt(). pivot() is faster than pivot_table() because it skips the aggregation step.

Use groupby() when you need grouped calculations but do not need a wide-format result. groupby returns long-format output by default. You can achieve pivot_table-like output with groupby().unstack(), but pivot_table is more readable.

Use crosstab() when you are computing frequency tables or cross-tabulations of categorical variables. crosstab() accepts Series or arrays directly (not just DataFrames) and defaults to counting.

Equivalence Example

These three lines produce the same result:

# pivot_table approach
result1 = pd.pivot_table(sales, values='revenue', index='region',
                         columns='product', aggfunc='sum')
 
# groupby + unstack approach
result2 = sales.groupby(['region', 'product'])['revenue'].sum().unstack()
 
# Both produce the same table
print(result1.equals(result2))  # True

The pivot_table version is more readable, especially when you add margins, fill values, or multiple aggregation functions.

Flattening MultiIndex Columns

After creating a pivot table with multiple aggregation functions, you often get MultiIndex columns that are hard to work with:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc=['sum', 'mean'])
 
# Flatten the column MultiIndex
table.columns = ['_'.join(col).strip() for col in table.columns.values]
print(table)

Output:

        sum_Gadget  sum_Widget  mean_Gadget  mean_Widget
region
East          1770        2550        885.0       1275.0
North         1650        2600        825.0       1300.0
South         1730        2600        865.0       1300.0

Now the columns are flat strings that are easier to reference.

Sorting and Filtering Pivot Tables

Pivot tables are regular DataFrames, so you can sort and filter them:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc='sum', margins=True)
 
# Sort by total revenue (All column), descending
sorted_table = table.sort_values('All', ascending=False)
print(sorted_table)
 
# Filter to show only regions with Widget revenue > 2500
filtered = table[table['Widget'] > 2500]
print(filtered)

Exporting Pivot Tables

Save your pivot table to Excel (where stakeholders expect it) or CSV:

table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc='sum', margins=True)
 
# Export to Excel
table.to_excel('sales_pivot.xlsx', sheet_name='Revenue Summary')
 
# Export to CSV
table.to_csv('sales_pivot.csv')

Performance Tips

pivot_table() calls groupby internally, so performance is similar. For large datasets:

Dataset SizeExpected Time
100K rows, 2 group cols~5 ms
1M rows, 2 group cols~50 ms
10M rows, 3 group cols~500 ms

Optimization strategies:

  1. Reduce data before pivoting -- filter rows and select only needed columns before calling pivot_table().
  2. Use categorical dtypes -- convert string columns to category dtype for faster grouping.
  3. Avoid lambda aggfuncs -- built-in string names ('sum', 'mean') use optimized C code. Lambda functions fall back to slow Python loops.
# Faster: use categorical dtypes
sales['region'] = sales['region'].astype('category')
sales['product'] = sales['product'].astype('category')
 
# Faster: use string name instead of lambda
table = pd.pivot_table(sales, values='revenue', index='region',
                       columns='product', aggfunc='sum')  # 'sum' is optimized

Visualize Pivot Tables with PyGWalker

While pd.pivot_table() is great for numerical summaries, sometimes you need to visualize the patterns in your data interactively. PyGWalker (opens in a new tab) is an open-source Python library that lets you create pivot tables, bar charts, heatmaps, and more through a visual drag-and-drop interface -- no code required after the initial setup.

import pandas as pd
import pygwalker as pyg
 
sales = pd.DataFrame({
    'region': ['North', 'North', 'South', 'South', 'East', 'East'] * 2,
    'product': ['Widget', 'Gadget'] * 6,
    'revenue': [1200, 800, 1500, 950, 1300, 900, 1400, 850, 1100, 780, 1250, 870],
    'units': [10, 8, 15, 10, 13, 9, 12, 9, 11, 8, 12, 9]
})
 
# Launch interactive pivot table and visualization
walker = pyg.walk(sales)

PyGWalker gives you a Tableau-like interface where you can drag region to rows, product to columns, and revenue to values to create a pivot table visually. You can switch between table view and chart view instantly, try different aggregation functions, and export the result -- all without writing additional code.

Try PyGWalker in Google Colab (opens in a new tab), Kaggle (opens in a new tab), or install with pip install pygwalker.

FAQ

What is the difference between pivot and pivot_table in pandas?

pivot() reshapes data without aggregation -- it requires unique combinations of index and column values and raises an error if duplicates exist. pivot_table() handles duplicates by aggregating them using a function like sum or mean. Use pivot() for simple reshaping and pivot_table() when you need aggregation or subtotals.

How do I add totals (margins) to a pandas pivot table?

Set margins=True in the pivot_table() call: pd.pivot_table(df, values='revenue', index='region', columns='product', aggfunc='sum', margins=True). This adds an All row and column with subtotals. Customize the label with margins_name='Total'.

Can I use multiple aggregation functions in a pivot table?

Yes. Pass a list to aggfunc: aggfunc=['sum', 'mean', 'count']. This creates a MultiIndex column header with one level for the function and one for the value columns. You can also pass a dictionary to apply different functions to different columns: aggfunc={'revenue': 'sum', 'units': 'mean'}.

How is pivot_table different from groupby in pandas?

Both aggregate data, but pivot_table() produces a wide-format result (with column headers from one of the grouping variables), while groupby() produces a long-format result by default. pivot_table() also supports margins (subtotals) and fill values natively. Internally, pivot_table() uses groupby().

How do I handle NaN values in a pivot table?

Use the fill_value parameter to replace NaN with a specific value: pd.pivot_table(df, ..., fill_value=0). The dropna=True parameter (default) excludes columns where all entries are NaN.

Can I create a pivot table with percentage values?

Yes. Create the pivot table with counts or sums first, then divide by the total. You can also use normalize in pd.crosstab() for percentage-based cross-tabulations. For pivot tables, calculate percentages manually: table = table.div(table.sum(axis=1), axis=0) * 100.

Conclusion

The pandas pivot_table() function is the most versatile tool for creating summary tables in Python. Here are the key points to remember:

  • Use values, index, and columns to define what to summarize, how to group rows, and how to group columns.
  • Use aggfunc to specify the aggregation function. Pass a list for multiple functions or a dictionary for per-column functions.
  • Use margins=True to add subtotals -- the equivalent of "Grand Total" in Excel.
  • Use fill_value to replace missing combinations with a default value (typically 0).
  • Prefer pivot_table() over groupby().unstack() for readability, especially when you need margins or fill values.
  • Prefer pivot() over pivot_table() when your data has unique key combinations and you do not need aggregation.
  • Prefer crosstab() for simple frequency tables of categorical variables.

For interactive exploration of your pivot tables, PyGWalker (opens in a new tab) provides a visual, drag-and-drop interface that mirrors the Excel pivot table experience inside Jupyter Notebook.

Related Guides

📚