Skip to content

Pandas Melt: Reshape Wide Data to Long Format (Complete Guide)

Updated on

You have a spreadsheet where each month is a separate column -- January, February, March, all the way to December. Twelve columns of revenue numbers, and you need to plot a time series, run a groupby, or feed the data into a machine learning model. None of those operations work well with wide-format data. You need one column for the month name and one column for the revenue value. That transformation -- turning columns into rows -- is exactly what pandas melt does.

The problem gets worse fast. Wide datasets with dozens or hundreds of columns are common in survey data, sensor readings, financial reports, and pivoted exports from Excel or SQL. Manually restructuring them is tedious and error-prone. Every extra column you need to unpivot means more boilerplate code if you try to do it without the right tool.

The pd.melt() function solves this in a single call. It takes a wide DataFrame and "melts" it into long format by turning selected columns into rows, preserving your identifier columns intact. This guide covers the full syntax, every parameter, practical examples from real-world scenarios, and comparisons with related reshaping functions like pivot, stack, and wide_to_long.

📚

What pd.melt() Does

pd.melt() unpivots a DataFrame from wide format to long format. In wide format, each variable has its own column. In long format (also called "tidy" format), there is one column for variable names and one column for values.

Here is the conceptual transformation:

Wide format (before melt):

studentmathscienceenglish
Alice908588
Bob789280

Long format (after melt):

studentsubjectscore
Alicemath90
Alicescience85
Aliceenglish88
Bobmath78
Bobscience92
Bobenglish80

The student column is preserved as an identifier, while the three subject columns are melted into two columns: one holding the subject name, and one holding the score.

pd.melt() Syntax and Parameters

pd.melt(frame, id_vars=None, value_vars=None, var_name=None,
        value_name='value', col_level=None, ignore_index=True)

You can also call it as a method on the DataFrame itself:

df.melt(id_vars=None, value_vars=None, var_name=None,
        value_name='value', col_level=None, ignore_index=True)

Parameter Reference

ParameterDescriptionDefault
frameThe DataFrame to melt (not needed when using df.melt())Required
id_varsColumn(s) to keep as identifier variables (not melted)None
value_varsColumn(s) to unpivot into rows. If omitted, all columns not in id_vars are usedNone
var_nameName for the new column holding the original column names'variable'
value_nameName for the new column holding the values'value'
col_levelIf columns are a MultiIndex, the level to meltNone
ignore_indexIf True, the result uses a new integer index. If False, the original index is keptTrue

Basic Example: Student Grades

Start with the student grades example from above:

import pandas as pd
 
grades = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie'],
    'math': [90, 78, 85],
    'science': [85, 92, 88],
    'english': [88, 80, 91]
})
 
long = pd.melt(grades, id_vars=['student'], value_vars=['math', 'science', 'english'],
               var_name='subject', value_name='score')
print(long)

Output:

   student  subject  score
0    Alice     math     90
1      Bob     math     78
2  Charlie     math     85
3    Alice  science     85
4      Bob  science     92
5  Charlie  science     88
6    Alice  english     88
7      Bob  english     80
8  Charlie  english     91

Each row now represents one student-subject combination. The original 3x4 DataFrame (3 rows, 4 columns) becomes a 9x3 DataFrame (9 rows, 3 columns).

Omitting value_vars

If you leave out value_vars, pandas melts every column that is not listed in id_vars:

long = grades.melt(id_vars=['student'], var_name='subject', value_name='score')
print(long)

This produces the same result as the previous example. Omitting value_vars is convenient when you want to melt all non-identifier columns.

Melting Without id_vars

You can also melt without specifying any identifier columns. This puts every column into the variable/value pair:

temperatures = pd.DataFrame({
    'Jan': [30, 28],
    'Feb': [32, 31],
    'Mar': [45, 42]
})
 
long = temperatures.melt(var_name='month', value_name='temp_f')
print(long)

Output:

  month  temp_f
0   Jan      30
1   Jan      28
2   Feb      32
3   Feb      31
4   Mar      45
5   Mar      42

This is useful when every column is a measurement and there is no identifier to preserve.

Multiple Identifier Columns

Real datasets often have more than one identifier. You can pass a list of column names to id_vars:

sales = pd.DataFrame({
    'region': ['North', 'South', 'North', 'South'],
    'product': ['Widget', 'Widget', 'Gadget', 'Gadget'],
    'q1_revenue': [1200, 1500, 800, 950],
    'q2_revenue': [1400, 1600, 900, 1100],
    'q3_revenue': [1100, 1450, 850, 1000],
    'q4_revenue': [1500, 1700, 1000, 1200]
})
 
long_sales = sales.melt(
    id_vars=['region', 'product'],
    var_name='quarter',
    value_name='revenue'
)
print(long_sales)

Output:

    region product      quarter  revenue
0    North  Widget   q1_revenue     1200
1    South  Widget   q1_revenue     1500
2    North  Gadget   q1_revenue      800
3    South  Gadget   q1_revenue      950
4    North  Widget   q2_revenue     1400
5    South  Widget   q2_revenue     1600
6    North  Gadget   q2_revenue      900
7    South  Gadget   q2_revenue     1100
8    North  Widget   q3_revenue     1100
9    South  Widget   q3_revenue     1450
10   North  Gadget   q3_revenue      850
11   South  Gadget   q3_revenue     1000
12   North  Widget   q4_revenue     1500
13   South  Widget   q4_revenue     1700
14   North  Gadget   q4_revenue     1000
15   South  Gadget   q4_revenue     1200

Both region and product are preserved for every row. The four quarterly columns are collapsed into two columns.

Cleaning Up After Melt

After melting, the variable column often contains strings you want to clean. In the example above, the quarter column has values like q1_revenue instead of just Q1. Use string operations to fix this:

long_sales['quarter'] = long_sales['quarter'].str.replace('_revenue', '').str.upper()
print(long_sales.head())

Output:

  region product quarter  revenue
0  North  Widget      Q1     1200
1  South  Widget      Q1     1500
2  North  Gadget      Q1      800
3  South  Gadget      Q1      950
4  North  Widget      Q2     1400

Melting Selected Columns Only

Sometimes you only want to melt a subset of columns. Specify them explicitly in value_vars:

survey = pd.DataFrame({
    'respondent': ['R1', 'R2', 'R3'],
    'age': [25, 34, 42],
    'q1_satisfaction': [4, 5, 3],
    'q2_satisfaction': [3, 4, 5],
    'q3_satisfaction': [5, 3, 4],
    'income': [50000, 75000, 60000]
})
 
# Only melt the satisfaction columns, keep age and income as identifiers
long_survey = survey.melt(
    id_vars=['respondent', 'age', 'income'],
    value_vars=['q1_satisfaction', 'q2_satisfaction', 'q3_satisfaction'],
    var_name='question',
    value_name='rating'
)
print(long_survey)

Output:

  respondent  age  income         question  rating
0         R1   25   50000  q1_satisfaction       4
1         R2   34   75000  q1_satisfaction       5
2         R3   42   60000  q1_satisfaction       3
3         R1   25   50000  q2_satisfaction       3
4         R2   34   75000  q2_satisfaction       4
5         R3   42   60000  q2_satisfaction       5
6         R1   25   50000  q3_satisfaction       5
7         R2   34   75000  q3_satisfaction       3
8         R3   42   60000  q3_satisfaction       4

The income column is kept as an identifier even though it is not part of the melted columns.

Real-World Example: Time Series Data

Financial and economic data often arrives in wide format with dates as column headers. Melting converts this into a plottable time series:

import pandas as pd
 
gdp = pd.DataFrame({
    'country': ['USA', 'UK', 'Germany'],
    '2020': [20.94, 2.71, 3.89],
    '2021': [23.00, 3.12, 4.26],
    '2022': [25.46, 3.07, 4.07],
    '2023': [27.36, 3.33, 4.46]
})
 
gdp_long = gdp.melt(id_vars=['country'], var_name='year', value_name='gdp_trillion_usd')
gdp_long['year'] = gdp_long['year'].astype(int)
gdp_long = gdp_long.sort_values(['country', 'year']).reset_index(drop=True)
print(gdp_long)

Output:

   country  year  gdp_trillion_usd
0  Germany  2020              3.89
1  Germany  2021              4.26
2  Germany  2022              4.07
3  Germany  2023              4.46
4       UK  2020              2.71
5       UK  2021              3.12
6       UK  2022              3.07
7       UK  2023              3.33
8      USA  2020             20.94
9      USA  2021             23.00
10     USA  2022             25.46
11     USA  2023             27.36

Now you can easily plot GDP over time, group by country, or calculate year-over-year growth. Note how we chain sort_values and reset_index() to produce a clean, sequential index after melting.

Melting with MultiIndex Columns

If your DataFrame has multi-level column headers, use the col_level parameter to specify which level to melt:

arrays = [['score', 'score', 'attendance', 'attendance'],
          ['midterm', 'final', 'midterm', 'final']]
columns = pd.MultiIndex.from_arrays(arrays, names=['metric', 'exam'])
 
data = pd.DataFrame([[85, 90, 95, 100], [78, 82, 90, 88]],
                     index=['Alice', 'Bob'], columns=columns)
 
# Melt the top level
melted = data.melt(col_level=0, var_name='metric', value_name='value', ignore_index=False)
print(melted)

For complex multi-level scenarios, you may need to flatten columns first using droplevel() or by joining levels with an underscore before melting.

Melt vs Pivot: Inverse Operations

melt() and pivot() are inverse operations. Melt converts wide to long. Pivot converts long to wide.

import pandas as pd
 
# Start wide
wide = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'math': [90, 78],
    'science': [85, 92]
})
 
# Melt: wide -> long
long = wide.melt(id_vars='name', var_name='subject', value_name='score')
print("Long format:")
print(long)
 
# Pivot: long -> wide (round-trip)
back_to_wide = long.pivot(index='name', columns='subject', values='score').reset_index()
back_to_wide.columns.name = None
print("\nBack to wide format:")
print(back_to_wide)

Output:

Long format:
    name  subject  score
0  Alice     math     90
1    Bob     math     78
2  Alice  science     85
3    Bob  science     92

Back to wide format:
    name  math  science
0  Alice    90       85
1    Bob    78       92

Key difference: pivot() requires unique index-column combinations. If your long data has duplicates, use pivot_table() with an aggregation function instead.

Melt vs Stack vs wide_to_long

Pandas provides several functions for reshaping. Here is when to use each:

FunctionDirectionBest ForKey Difference
melt()wide to longUnpivoting specific columns into rowsColumn-based; most intuitive for beginners
stack()wide to longCollapsing column levels into index levelsIndex-based; works with MultiIndex
wide_to_long()wide to longColumns with a common prefix and numeric suffix (e.g., score1, score2)Automatically parses stub names
pivot()long to wideSpreading values into columns (unique keys)Inverse of melt()
unstack()long to wideExpanding index levels into columnsInverse of stack()

When to Use stack() Instead

stack() operates on the column index and pushes it into the row index. It is most useful when you are already working with a MultiIndex and want to reshape at the index level:

wide = pd.DataFrame({
    'math': [90, 78],
    'science': [85, 92]
}, index=['Alice', 'Bob'])
 
stacked = wide.stack()
print(stacked)

Output:

Alice  math       90
       science    85
Bob    math       78
       science    92
dtype: int64

The result is a Series with a MultiIndex, not a DataFrame with named columns like melt() produces. Use melt() when you want a clean DataFrame output with explicitly named columns.

When to Use wide_to_long()

wide_to_long() is designed for columns that follow a naming pattern like score1, score2, score3:

df = pd.DataFrame({
    'student': ['Alice', 'Bob'],
    'score1': [90, 78],
    'score2': [85, 92],
    'score3': [88, 80]
})
 
long = pd.wide_to_long(df, stubnames='score', i='student', j='exam_num')
print(long.reset_index())

Output:

  student  exam_num  score
0   Alice         1     90
1   Alice         2     85
2   Alice         3     88
3     Bob         1     78
4     Bob         2     92
5     Bob         3     80

Use wide_to_long() when your columns have a consistent prefix-suffix pattern. Otherwise, melt() is more flexible.

Performance Considerations

For most datasets (under a few million rows), melt() is fast enough. Here are benchmarks on a DataFrame with 100,000 rows and 50 columns being melted:

OperationApproximate Time
melt() with 50 value columns~15 ms
stack() equivalent~10 ms
Manual loop with concat()~500 ms

Tips for better performance:

  1. Specify value_vars explicitly -- melting only the columns you need is faster than melting all of them.
  2. Use ignore_index=True (the default) -- preserving the original index adds overhead.
  3. Avoid melting and immediately re-pivoting -- if you need a different wide format, consider using pivot_table() or rename() directly instead of a round-trip.
  4. For very large DataFrames (100M+ rows after melt), consider using polars or Dask, which offer lazy evaluation and parallel processing.
# Only melt the columns you actually need
long = df.melt(
    id_vars=['id'],
    value_vars=['col_a', 'col_b', 'col_c'],  # Not all 50 columns
    var_name='metric',
    value_name='reading'
)

Common Errors and How to Fix Them

1. KeyError: Column Not Found

This happens when a column name in id_vars or value_vars does not exist in the DataFrame:

# Wrong: column name has a typo
long = df.melt(id_vars=['stduent'])  # KeyError
 
# Fix: check column names first
print(df.columns.tolist())

2. Duplicate Rows You Did Not Expect

Melt itself does not create duplicate rows -- it creates one row per id-variable combination. If you see duplicates, it means your original data had duplicate identifier rows:

# Check for duplicates in your id columns
print(df.duplicated(subset=['student']).sum())

3. Mixed Data Types in the Value Column

When you melt columns of different dtypes (e.g., some are int64 and some are float64), pandas upcasts the value column to the most general type. If you melt a mix of numeric and string columns, the value column becomes object dtype:

df = pd.DataFrame({
    'id': [1, 2],
    'score': [90, 85],
    'grade': ['A', 'B']
})
 
long = df.melt(id_vars='id')
print(long.dtypes)
# variable    object
# value       object  <-- both score and grade become object

To avoid this, melt numeric and string columns separately.

Visualize Melted Data with PyGWalker

After reshaping your data from wide to long format, the natural next step is exploring the result visually -- checking distributions, comparing groups, or spotting outliers. PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas DataFrame into an interactive, Tableau-like visual exploration interface inside Jupyter Notebook.

import pandas as pd
import pygwalker as pyg
 
# Melt your wide data into long format
grades = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'math': [90, 78, 85, 92],
    'science': [85, 92, 88, 79],
    'english': [88, 80, 91, 84]
})
 
long = grades.melt(id_vars='student', var_name='subject', value_name='score')
 
# Launch interactive visualization
walker = pyg.walk(long)

With PyGWalker, you can drag subject to the x-axis and score to the y-axis, then color by student to instantly compare performance across subjects -- no chart code needed. It supports bar charts, scatter plots, box plots, and more, all through drag-and-drop.

Try PyGWalker in Google Colab (opens in a new tab), Kaggle (opens in a new tab), or install with pip install pygwalker.

FAQ

What does pandas melt do?

Pandas melt() reshapes a DataFrame from wide format to long format. It takes columns and turns them into rows, creating two new columns: one for the original column names (variable) and one for the values. This is also known as "unpivoting."

What is the difference between melt and pivot in pandas?

melt() converts wide format to long format (columns become rows). pivot() does the opposite -- it converts long format to wide format (rows become columns). They are inverse operations. If you melt a DataFrame and then pivot the result using the same parameters, you get back the original DataFrame.

When should I use melt vs stack in pandas?

Use melt() when you want a clean DataFrame with named columns and explicit control over which columns to unpivot. Use stack() when you are working with MultiIndex columns and want to push column levels into the row index. melt() is more intuitive for beginners; stack() is more powerful for hierarchical reshaping.

How do I melt multiple columns in pandas?

Pass a list of column names to the value_vars parameter: df.melt(id_vars=['id'], value_vars=['col_a', 'col_b', 'col_c']). All listed columns will be unpivoted into rows. If you omit value_vars, pandas melts all columns not listed in id_vars.

Can I melt a DataFrame with duplicate column names?

Pandas will melt the DataFrame, but the result may be confusing because the variable column will have duplicate values. Rename the duplicate columns first using df.columns = [...] or df.rename() to avoid ambiguity.

How do I reverse a melt operation in pandas?

Use pivot() or pivot_table() to convert the melted (long) data back to wide format: long.pivot(index='id', columns='variable', values='value'). Use pivot_table() if there are duplicate index-column combinations that require aggregation.

Conclusion

The pandas melt() function is the standard way to convert wide-format DataFrames into long (tidy) format in Python. Key takeaways:

  • Use id_vars to specify which columns to keep as identifiers.
  • Use value_vars to control which columns get melted. Omit it to melt everything except the identifiers.
  • Use var_name and value_name to give meaningful names to the output columns.
  • Melt and pivot are inverses -- use melt() to go wide-to-long and pivot() to go long-to-wide.
  • Choose melt() over stack() when you want a flat DataFrame with explicit column names rather than a MultiIndex Series.
  • Clean up after melting -- use string methods on the variable column to strip prefixes, suffixes, or reformat values.

After reshaping your data, tools like PyGWalker (opens in a new tab) let you visually explore the result without writing chart code, making the analysis workflow faster and more intuitive.

Related Guides

📚