Pandas Melt: Reshape Wide Data to Long Format (Complete Guide)
Updated on
You have a spreadsheet where each month is a separate column -- January, February, March, all the way to December. Twelve columns of revenue numbers, and you need to plot a time series, run a groupby, or feed the data into a machine learning model. None of those operations work well with wide-format data. You need one column for the month name and one column for the revenue value. That transformation -- turning columns into rows -- is exactly what pandas melt does.
The problem gets worse fast. Wide datasets with dozens or hundreds of columns are common in survey data, sensor readings, financial reports, and pivoted exports from Excel or SQL. Manually restructuring them is tedious and error-prone. Every extra column you need to unpivot means more boilerplate code if you try to do it without the right tool.
The pd.melt() function solves this in a single call. It takes a wide DataFrame and "melts" it into long format by turning selected columns into rows, preserving your identifier columns intact. This guide covers the full syntax, every parameter, practical examples from real-world scenarios, and comparisons with related reshaping functions like pivot, stack, and wide_to_long.
What pd.melt() Does
pd.melt() unpivots a DataFrame from wide format to long format. In wide format, each variable has its own column. In long format (also called "tidy" format), there is one column for variable names and one column for values.
Here is the conceptual transformation:
Wide format (before melt):
| student | math | science | english |
|---|---|---|---|
| Alice | 90 | 85 | 88 |
| Bob | 78 | 92 | 80 |
Long format (after melt):
| student | subject | score |
|---|---|---|
| Alice | math | 90 |
| Alice | science | 85 |
| Alice | english | 88 |
| Bob | math | 78 |
| Bob | science | 92 |
| Bob | english | 80 |
The student column is preserved as an identifier, while the three subject columns are melted into two columns: one holding the subject name, and one holding the score.
pd.melt() Syntax and Parameters
pd.melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None, ignore_index=True)You can also call it as a method on the DataFrame itself:
df.melt(id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None, ignore_index=True)Parameter Reference
| Parameter | Description | Default |
|---|---|---|
frame | The DataFrame to melt (not needed when using df.melt()) | Required |
id_vars | Column(s) to keep as identifier variables (not melted) | None |
value_vars | Column(s) to unpivot into rows. If omitted, all columns not in id_vars are used | None |
var_name | Name for the new column holding the original column names | 'variable' |
value_name | Name for the new column holding the values | 'value' |
col_level | If columns are a MultiIndex, the level to melt | None |
ignore_index | If True, the result uses a new integer index. If False, the original index is kept | True |
Basic Example: Student Grades
Start with the student grades example from above:
import pandas as pd
grades = pd.DataFrame({
'student': ['Alice', 'Bob', 'Charlie'],
'math': [90, 78, 85],
'science': [85, 92, 88],
'english': [88, 80, 91]
})
long = pd.melt(grades, id_vars=['student'], value_vars=['math', 'science', 'english'],
var_name='subject', value_name='score')
print(long)Output:
student subject score
0 Alice math 90
1 Bob math 78
2 Charlie math 85
3 Alice science 85
4 Bob science 92
5 Charlie science 88
6 Alice english 88
7 Bob english 80
8 Charlie english 91Each row now represents one student-subject combination. The original 3x4 DataFrame (3 rows, 4 columns) becomes a 9x3 DataFrame (9 rows, 3 columns).
Omitting value_vars
If you leave out value_vars, pandas melts every column that is not listed in id_vars:
long = grades.melt(id_vars=['student'], var_name='subject', value_name='score')
print(long)This produces the same result as the previous example. Omitting value_vars is convenient when you want to melt all non-identifier columns.
Melting Without id_vars
You can also melt without specifying any identifier columns. This puts every column into the variable/value pair:
temperatures = pd.DataFrame({
'Jan': [30, 28],
'Feb': [32, 31],
'Mar': [45, 42]
})
long = temperatures.melt(var_name='month', value_name='temp_f')
print(long)Output:
month temp_f
0 Jan 30
1 Jan 28
2 Feb 32
3 Feb 31
4 Mar 45
5 Mar 42This is useful when every column is a measurement and there is no identifier to preserve.
Multiple Identifier Columns
Real datasets often have more than one identifier. You can pass a list of column names to id_vars:
sales = pd.DataFrame({
'region': ['North', 'South', 'North', 'South'],
'product': ['Widget', 'Widget', 'Gadget', 'Gadget'],
'q1_revenue': [1200, 1500, 800, 950],
'q2_revenue': [1400, 1600, 900, 1100],
'q3_revenue': [1100, 1450, 850, 1000],
'q4_revenue': [1500, 1700, 1000, 1200]
})
long_sales = sales.melt(
id_vars=['region', 'product'],
var_name='quarter',
value_name='revenue'
)
print(long_sales)Output:
region product quarter revenue
0 North Widget q1_revenue 1200
1 South Widget q1_revenue 1500
2 North Gadget q1_revenue 800
3 South Gadget q1_revenue 950
4 North Widget q2_revenue 1400
5 South Widget q2_revenue 1600
6 North Gadget q2_revenue 900
7 South Gadget q2_revenue 1100
8 North Widget q3_revenue 1100
9 South Widget q3_revenue 1450
10 North Gadget q3_revenue 850
11 South Gadget q3_revenue 1000
12 North Widget q4_revenue 1500
13 South Widget q4_revenue 1700
14 North Gadget q4_revenue 1000
15 South Gadget q4_revenue 1200Both region and product are preserved for every row. The four quarterly columns are collapsed into two columns.
Cleaning Up After Melt
After melting, the variable column often contains strings you want to clean. In the example above, the quarter column has values like q1_revenue instead of just Q1. Use string operations to fix this:
long_sales['quarter'] = long_sales['quarter'].str.replace('_revenue', '').str.upper()
print(long_sales.head())Output:
region product quarter revenue
0 North Widget Q1 1200
1 South Widget Q1 1500
2 North Gadget Q1 800
3 South Gadget Q1 950
4 North Widget Q2 1400Melting Selected Columns Only
Sometimes you only want to melt a subset of columns. Specify them explicitly in value_vars:
survey = pd.DataFrame({
'respondent': ['R1', 'R2', 'R3'],
'age': [25, 34, 42],
'q1_satisfaction': [4, 5, 3],
'q2_satisfaction': [3, 4, 5],
'q3_satisfaction': [5, 3, 4],
'income': [50000, 75000, 60000]
})
# Only melt the satisfaction columns, keep age and income as identifiers
long_survey = survey.melt(
id_vars=['respondent', 'age', 'income'],
value_vars=['q1_satisfaction', 'q2_satisfaction', 'q3_satisfaction'],
var_name='question',
value_name='rating'
)
print(long_survey)Output:
respondent age income question rating
0 R1 25 50000 q1_satisfaction 4
1 R2 34 75000 q1_satisfaction 5
2 R3 42 60000 q1_satisfaction 3
3 R1 25 50000 q2_satisfaction 3
4 R2 34 75000 q2_satisfaction 4
5 R3 42 60000 q2_satisfaction 5
6 R1 25 50000 q3_satisfaction 5
7 R2 34 75000 q3_satisfaction 3
8 R3 42 60000 q3_satisfaction 4The income column is kept as an identifier even though it is not part of the melted columns.
Real-World Example: Time Series Data
Financial and economic data often arrives in wide format with dates as column headers. Melting converts this into a plottable time series:
import pandas as pd
gdp = pd.DataFrame({
'country': ['USA', 'UK', 'Germany'],
'2020': [20.94, 2.71, 3.89],
'2021': [23.00, 3.12, 4.26],
'2022': [25.46, 3.07, 4.07],
'2023': [27.36, 3.33, 4.46]
})
gdp_long = gdp.melt(id_vars=['country'], var_name='year', value_name='gdp_trillion_usd')
gdp_long['year'] = gdp_long['year'].astype(int)
gdp_long = gdp_long.sort_values(['country', 'year']).reset_index(drop=True)
print(gdp_long)Output:
country year gdp_trillion_usd
0 Germany 2020 3.89
1 Germany 2021 4.26
2 Germany 2022 4.07
3 Germany 2023 4.46
4 UK 2020 2.71
5 UK 2021 3.12
6 UK 2022 3.07
7 UK 2023 3.33
8 USA 2020 20.94
9 USA 2021 23.00
10 USA 2022 25.46
11 USA 2023 27.36Now you can easily plot GDP over time, group by country, or calculate year-over-year growth. Note how we chain sort_values and reset_index() to produce a clean, sequential index after melting.
Melting with MultiIndex Columns
If your DataFrame has multi-level column headers, use the col_level parameter to specify which level to melt:
arrays = [['score', 'score', 'attendance', 'attendance'],
['midterm', 'final', 'midterm', 'final']]
columns = pd.MultiIndex.from_arrays(arrays, names=['metric', 'exam'])
data = pd.DataFrame([[85, 90, 95, 100], [78, 82, 90, 88]],
index=['Alice', 'Bob'], columns=columns)
# Melt the top level
melted = data.melt(col_level=0, var_name='metric', value_name='value', ignore_index=False)
print(melted)For complex multi-level scenarios, you may need to flatten columns first using droplevel() or by joining levels with an underscore before melting.
Melt vs Pivot: Inverse Operations
melt() and pivot() are inverse operations. Melt converts wide to long. Pivot converts long to wide.
import pandas as pd
# Start wide
wide = pd.DataFrame({
'name': ['Alice', 'Bob'],
'math': [90, 78],
'science': [85, 92]
})
# Melt: wide -> long
long = wide.melt(id_vars='name', var_name='subject', value_name='score')
print("Long format:")
print(long)
# Pivot: long -> wide (round-trip)
back_to_wide = long.pivot(index='name', columns='subject', values='score').reset_index()
back_to_wide.columns.name = None
print("\nBack to wide format:")
print(back_to_wide)Output:
Long format:
name subject score
0 Alice math 90
1 Bob math 78
2 Alice science 85
3 Bob science 92
Back to wide format:
name math science
0 Alice 90 85
1 Bob 78 92Key difference: pivot() requires unique index-column combinations. If your long data has duplicates, use pivot_table() with an aggregation function instead.
Melt vs Stack vs wide_to_long
Pandas provides several functions for reshaping. Here is when to use each:
| Function | Direction | Best For | Key Difference |
|---|---|---|---|
melt() | wide to long | Unpivoting specific columns into rows | Column-based; most intuitive for beginners |
stack() | wide to long | Collapsing column levels into index levels | Index-based; works with MultiIndex |
wide_to_long() | wide to long | Columns with a common prefix and numeric suffix (e.g., score1, score2) | Automatically parses stub names |
pivot() | long to wide | Spreading values into columns (unique keys) | Inverse of melt() |
unstack() | long to wide | Expanding index levels into columns | Inverse of stack() |
When to Use stack() Instead
stack() operates on the column index and pushes it into the row index. It is most useful when you are already working with a MultiIndex and want to reshape at the index level:
wide = pd.DataFrame({
'math': [90, 78],
'science': [85, 92]
}, index=['Alice', 'Bob'])
stacked = wide.stack()
print(stacked)Output:
Alice math 90
science 85
Bob math 78
science 92
dtype: int64The result is a Series with a MultiIndex, not a DataFrame with named columns like melt() produces. Use melt() when you want a clean DataFrame output with explicitly named columns.
When to Use wide_to_long()
wide_to_long() is designed for columns that follow a naming pattern like score1, score2, score3:
df = pd.DataFrame({
'student': ['Alice', 'Bob'],
'score1': [90, 78],
'score2': [85, 92],
'score3': [88, 80]
})
long = pd.wide_to_long(df, stubnames='score', i='student', j='exam_num')
print(long.reset_index())Output:
student exam_num score
0 Alice 1 90
1 Alice 2 85
2 Alice 3 88
3 Bob 1 78
4 Bob 2 92
5 Bob 3 80Use wide_to_long() when your columns have a consistent prefix-suffix pattern. Otherwise, melt() is more flexible.
Performance Considerations
For most datasets (under a few million rows), melt() is fast enough. Here are benchmarks on a DataFrame with 100,000 rows and 50 columns being melted:
| Operation | Approximate Time |
|---|---|
melt() with 50 value columns | ~15 ms |
stack() equivalent | ~10 ms |
Manual loop with concat() | ~500 ms |
Tips for better performance:
- Specify
value_varsexplicitly -- melting only the columns you need is faster than melting all of them. - Use
ignore_index=True(the default) -- preserving the original index adds overhead. - Avoid melting and immediately re-pivoting -- if you need a different wide format, consider using
pivot_table()orrename()directly instead of a round-trip. - For very large DataFrames (100M+ rows after melt), consider using polars or Dask, which offer lazy evaluation and parallel processing.
# Only melt the columns you actually need
long = df.melt(
id_vars=['id'],
value_vars=['col_a', 'col_b', 'col_c'], # Not all 50 columns
var_name='metric',
value_name='reading'
)Common Errors and How to Fix Them
1. KeyError: Column Not Found
This happens when a column name in id_vars or value_vars does not exist in the DataFrame:
# Wrong: column name has a typo
long = df.melt(id_vars=['stduent']) # KeyError
# Fix: check column names first
print(df.columns.tolist())2. Duplicate Rows You Did Not Expect
Melt itself does not create duplicate rows -- it creates one row per id-variable combination. If you see duplicates, it means your original data had duplicate identifier rows:
# Check for duplicates in your id columns
print(df.duplicated(subset=['student']).sum())3. Mixed Data Types in the Value Column
When you melt columns of different dtypes (e.g., some are int64 and some are float64), pandas upcasts the value column to the most general type. If you melt a mix of numeric and string columns, the value column becomes object dtype:
df = pd.DataFrame({
'id': [1, 2],
'score': [90, 85],
'grade': ['A', 'B']
})
long = df.melt(id_vars='id')
print(long.dtypes)
# variable object
# value object <-- both score and grade become objectTo avoid this, melt numeric and string columns separately.
Visualize Melted Data with PyGWalker
After reshaping your data from wide to long format, the natural next step is exploring the result visually -- checking distributions, comparing groups, or spotting outliers. PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas DataFrame into an interactive, Tableau-like visual exploration interface inside Jupyter Notebook.
import pandas as pd
import pygwalker as pyg
# Melt your wide data into long format
grades = pd.DataFrame({
'student': ['Alice', 'Bob', 'Charlie', 'Diana'],
'math': [90, 78, 85, 92],
'science': [85, 92, 88, 79],
'english': [88, 80, 91, 84]
})
long = grades.melt(id_vars='student', var_name='subject', value_name='score')
# Launch interactive visualization
walker = pyg.walk(long)With PyGWalker, you can drag subject to the x-axis and score to the y-axis, then color by student to instantly compare performance across subjects -- no chart code needed. It supports bar charts, scatter plots, box plots, and more, all through drag-and-drop.
Try PyGWalker in Google Colab (opens in a new tab), Kaggle (opens in a new tab), or install with
pip install pygwalker.
FAQ
What does pandas melt do?
Pandas melt() reshapes a DataFrame from wide format to long format. It takes columns and turns them into rows, creating two new columns: one for the original column names (variable) and one for the values. This is also known as "unpivoting."
What is the difference between melt and pivot in pandas?
melt() converts wide format to long format (columns become rows). pivot() does the opposite -- it converts long format to wide format (rows become columns). They are inverse operations. If you melt a DataFrame and then pivot the result using the same parameters, you get back the original DataFrame.
When should I use melt vs stack in pandas?
Use melt() when you want a clean DataFrame with named columns and explicit control over which columns to unpivot. Use stack() when you are working with MultiIndex columns and want to push column levels into the row index. melt() is more intuitive for beginners; stack() is more powerful for hierarchical reshaping.
How do I melt multiple columns in pandas?
Pass a list of column names to the value_vars parameter: df.melt(id_vars=['id'], value_vars=['col_a', 'col_b', 'col_c']). All listed columns will be unpivoted into rows. If you omit value_vars, pandas melts all columns not listed in id_vars.
Can I melt a DataFrame with duplicate column names?
Pandas will melt the DataFrame, but the result may be confusing because the variable column will have duplicate values. Rename the duplicate columns first using df.columns = [...] or df.rename() to avoid ambiguity.
How do I reverse a melt operation in pandas?
Use pivot() or pivot_table() to convert the melted (long) data back to wide format: long.pivot(index='id', columns='variable', values='value'). Use pivot_table() if there are duplicate index-column combinations that require aggregation.
Conclusion
The pandas melt() function is the standard way to convert wide-format DataFrames into long (tidy) format in Python. Key takeaways:
- Use
id_varsto specify which columns to keep as identifiers. - Use
value_varsto control which columns get melted. Omit it to melt everything except the identifiers. - Use
var_nameandvalue_nameto give meaningful names to the output columns. - Melt and pivot are inverses -- use
melt()to go wide-to-long andpivot()to go long-to-wide. - Choose
melt()overstack()when you want a flat DataFrame with explicit column names rather than a MultiIndex Series. - Clean up after melting -- use string methods on the variable column to strip prefixes, suffixes, or reformat values.
After reshaping your data, tools like PyGWalker (opens in a new tab) let you visually explore the result without writing chart code, making the analysis workflow faster and more intuitive.
Related Guides
- Pandas Pivot Table: Summarize and Reshape Data
- Pandas GroupBy: Aggregation, Transform, Apply
- Pandas Rename Column
- Pandas reset_index(): Resetting DataFrame Index