Skip to content
Topics
Pandas
Pandas Concat: How to Concatenate DataFrames in Python

Pandas Concat: How to Concatenate DataFrames in Python

Updated on

Real-world data rarely lives in a single file. You pull January sales from one CSV, February from another, and Q1 targets from a third. You scrape multiple web pages into separate DataFrames. You split a large dataset for parallel processing and need to reassemble the pieces. In every case, you need a reliable way to combine DataFrames without losing rows, scrambling columns, or corrupting your index.

The pandas concat function (pd.concat()) is the standard tool for this job. It stacks DataFrames vertically (adding rows) or horizontally (adding columns), handles mismatched columns gracefully, and scales to any number of DataFrames in a single call. This guide covers every parameter you need, with working code examples you can paste directly into your notebook.

📚

What pd.concat() Does -- Basic Syntax

pd.concat() takes a list (or dict) of DataFrames and glues them together along a specified axis. Think of it as stacking building blocks -- vertically to add more rows, or side by side to add more columns.

import pandas as pd
 
result = pd.concat(
    objs,              # list or dict of DataFrames
    axis=0,            # 0 = vertical (rows), 1 = horizontal (columns)
    join='outer',      # 'outer' or 'inner'
    ignore_index=False,
    keys=None,
    sort=False,
    verify_integrity=False
)

Key Parameters at a Glance

ParameterDescriptionDefault
objsA list or dict of DataFrames (or Series) to concatenateRequired
axis0 to stack rows (vertical), 1 to stack columns (horizontal)0
join'outer' keeps all columns; 'inner' keeps only shared columns'outer'
ignore_indexIf True, resets the index to 0, 1, 2, ... in the resultFalse
keysLabels to identify which original DataFrame each row came fromNone
sortSort non-concatenation axis (column names for axis=0)False
verify_integrityRaise error if the result has duplicate index valuesFalse

Sample Data for All Examples

Every example below uses these DataFrames:

import pandas as pd
 
df_jan = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'units_sold': [150, 200, 80],
    'revenue': [1500, 3000, 960]
})
 
df_feb = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'units_sold': [170, 180, 95],
    'revenue': [1700, 2700, 1140]
})
 
print(df_jan)
print(df_feb)

Output:

    product  units_sold  revenue
0    Widget         150     1500
1    Gadget         200     3000
2  Sprocket          80      960

    product  units_sold  revenue
0    Widget         170     1700
1    Gadget         180     2700
2  Sprocket          95     1140

Concatenating DataFrames Vertically (axis=0)

Vertical concatenation is the most common use case. It stacks one DataFrame on top of another, adding rows. This is what you use when you have monthly files, batch results, or any data split across multiple tables with the same columns.

combined = pd.concat([df_jan, df_feb])
print(combined)

Output:

    product  units_sold  revenue
0    Widget         150     1500
1    Gadget         200     3000
2  Sprocket          80      960
0    Widget         170     1700
1    Gadget         180     2700
2  Sprocket          95     1140

Notice the index: both DataFrames kept their original index values (0, 1, 2), so the result has duplicate index values. This is usually not what you want. The fix is the ignore_index parameter, covered next.

The ignore_index Parameter -- Resetting the Index

Setting ignore_index=True discards the original index and assigns a fresh sequential index starting from 0:

combined = pd.concat([df_jan, df_feb], ignore_index=True)
print(combined)

Output:

    product  units_sold  revenue
0    Widget         150     1500
1    Gadget         200     3000
2  Sprocket          80      960
3    Widget         170     1700
4    Gadget         180     2700
5  Sprocket          95     1140

When to use it: Almost always when concatenating vertically. Unless your index carries meaningful information (like timestamps or unique IDs), reset it to avoid confusion later.

The keys Parameter -- Creating a Hierarchical Index

The keys parameter adds a level to the index that labels which source each row came from. This creates a MultiIndex (hierarchical index):

combined = pd.concat([df_jan, df_feb], keys=['January', 'February'])
print(combined)

Output:

               product  units_sold  revenue
January  0      Widget         150     1500
         1      Gadget         200     3000
         2    Sprocket          80      960
February 0      Widget         170     1700
         1      Gadget         180     2700
         2    Sprocket          95     1140

You can then select data from a specific source using .loc:

# Get only January data
jan_data = combined.loc['January']
print(jan_data)

Output:

    product  units_sold  revenue
0    Widget         150     1500
1    Gadget         200     3000
2  Sprocket          80      960

When to use it: Use keys when you need to track which original DataFrame each row belongs to -- for example, data from different experiments, time periods, or data sources.

Concatenating Horizontally (axis=1) -- Side by Side

Setting axis=1 places DataFrames next to each other, adding columns. Pandas aligns rows by their index values.

targets = pd.DataFrame({
    'target_units': [160, 190, 90],
    'target_revenue': [1600, 2850, 1080]
})
 
result = pd.concat([df_jan, targets], axis=1)
print(result)

Output:

    product  units_sold  revenue  target_units  target_revenue
0    Widget         150     1500           160            1600
1    Gadget         200     3000           190            2850
2  Sprocket          80      960            90            1080

This works cleanly because both DataFrames share the same index (0, 1, 2). If the indexes do not align, you get NaN values for the mismatched rows:

df_a = pd.DataFrame({'value_a': [10, 20, 30]}, index=[0, 1, 2])
df_b = pd.DataFrame({'value_b': [40, 50, 60]}, index=[1, 2, 3])
 
result = pd.concat([df_a, df_b], axis=1)
print(result)

Output:

   value_a  value_b
0     10.0      NaN
1     20.0     40.0
2     30.0     50.0
3      NaN     60.0

Row 0 has no value_b (no matching index in df_b), and row 3 has no value_a (no matching index in df_a).

The join Parameter -- Inner vs Outer

The join parameter controls what happens when DataFrames have different columns (for axis=0) or different index values (for axis=1).

outer join (default) -- Keep Everything

df_with_extra = pd.DataFrame({
    'product': ['Widget', 'Gadget'],
    'units_sold': [200, 250],
    'region': ['East', 'West']
})
 
result = pd.concat([df_jan, df_with_extra], join='outer', ignore_index=True)
print(result)

Output:

    product  units_sold  revenue region
0    Widget         150   1500.0    NaN
1    Gadget         200   3000.0    NaN
2  Sprocket          80    960.0    NaN
3    Widget         200      NaN   East
4    Gadget         250      NaN   West

All columns from both DataFrames appear. Missing values are filled with NaN.

inner join -- Keep Only Shared Columns

result = pd.concat([df_jan, df_with_extra], join='inner', ignore_index=True)
print(result)

Output:

   product  units_sold
0   Widget         150
1   Gadget         200
2 Sprocket          80
3   Widget         200
4   Gadget         250

Only columns that exist in both DataFrames survive. The revenue column (missing from df_with_extra) and region column (missing from df_jan) are both dropped.

When to use inner join: When you want a clean result with no NaN values and are willing to lose columns that do not appear in every DataFrame.

Concatenating a List of DataFrames

One of the biggest advantages of pd.concat() over other combining methods is that it handles any number of DataFrames in a single call. This is the standard pattern for combining files loaded in a loop:

import pandas as pd
 
# Simulate loading monthly CSV files
months = {
    'Jan': {'product': ['Widget', 'Gadget'], 'units': [150, 200]},
    'Feb': {'product': ['Widget', 'Gadget'], 'units': [170, 180]},
    'Mar': {'product': ['Widget', 'Gadget'], 'units': [190, 210]},
}
 
dfs = []
for month, data in months.items():
    df = pd.DataFrame(data)
    df['month'] = month
    dfs.append(df)
 
all_data = pd.concat(dfs, ignore_index=True)
print(all_data)

Output:

  product  units month
0  Widget    150   Jan
1  Gadget    200   Jan
2  Widget    170   Feb
3  Gadget    180   Feb
4  Widget    190   Mar
5  Gadget    210   Mar

This pattern -- build a list of DataFrames, then concat once at the end -- is much faster than appending one DataFrame at a time in a loop. Each append creates a full copy, but a single pd.concat() call allocates memory once.

concat vs merge vs append -- Comparison Table

Pandas offers several ways to combine DataFrames. Choosing the right one depends on how you want to combine them:

Featurepd.concat()pd.merge()DataFrame.append()
Primary useStack DataFrames (rows or columns)Join on shared column values (like SQL)Add rows to a DataFrame
Number of inputsAny number at onceTwo at a timeTwo at a time
Matching logicAligns by index (or column names)Matches on key column valuesAligns by column names
Join typesouter, innerinner, left, right, outer, crossOuter only
Default behaviorOuter join, stack verticallyInner join on shared columnsOuter join, append rows
Best forCombining monthly files, batch results, same-schema tablesRelational joins (customer + orders)Deprecated since pandas 2.0
PerformanceFast for many DataFramesOptimized for two-table joinsSlow (copies data each call)

When to Use Each

  • Use pd.concat() when your DataFrames share the same structure (same columns) and you want to stack them. Also use it for horizontal concatenation when aligning by index.
  • Use pd.merge() when you need to match rows based on column values -- like joining a sales table with a products table on product_id. See our pandas merge guide for details.
  • Avoid DataFrame.append() -- it was deprecated in pandas 1.4 and removed in pandas 2.0. Use pd.concat([df1, df2]) instead.

Common Errors and Fixes

1. Columns Do Not Align

When concatenating DataFrames with different column names, the default outer join fills missing values with NaN. If you did not expect this, check your column names:

# Diagnose: compare column names
print(df1.columns.tolist())
print(df2.columns.tolist())
 
# Fix: rename columns to match before concatenating
df2 = df2.rename(columns={'sales': 'revenue', 'qty': 'units_sold'})
combined = pd.concat([df1, df2], ignore_index=True)

2. Dtype Mismatches After Concatenation

If one DataFrame stores a column as int64 and another stores it as float64, pandas upcasts to float. Worse, if one stores it as a string, you get an object dtype column:

# Check dtypes after concat
combined = pd.concat([df1, df2], ignore_index=True)
print(combined.dtypes)
 
# Fix: cast before concatenating
df2['units_sold'] = df2['units_sold'].astype(int)
combined = pd.concat([df1, df2], ignore_index=True)

3. Duplicated Index Values

Without ignore_index=True, vertical concatenation preserves original indexes, leading to duplicate values. This causes problems with .loc lookups:

combined = pd.concat([df1, df2])
# combined.loc[0] returns TWO rows, not one
 
# Fix option 1: use ignore_index
combined = pd.concat([df1, df2], ignore_index=True)
 
# Fix option 2: use verify_integrity to catch the issue early
combined = pd.concat([df1, df2], verify_integrity=True)  # raises ValueError

4. Accidentally Concatenating Along the Wrong Axis

If your result has twice the columns instead of twice the rows (or vice versa), check the axis parameter:

# Wrong: this adds columns side by side
wrong = pd.concat([df1, df2], axis=1)
 
# Right: this stacks rows vertically
right = pd.concat([df1, df2], axis=0)

Visualize Your Concatenated DataFrames with PyGWalker

After concatenating data from multiple sources, you often need to verify the result and explore patterns across the combined dataset. Instead of writing manual plotting code with matplotlib or seaborn, you can use PyGWalker (opens in a new tab) -- an open-source Python library that turns any pandas DataFrame into an interactive, Tableau-like visual exploration interface directly inside Jupyter Notebook.

import pandas as pd
import pygwalker as pyg
 
# Combine monthly sales data
df_jan = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'units_sold': [150, 200, 80],
    'revenue': [1500, 3000, 960],
    'month': ['Jan', 'Jan', 'Jan']
})
 
df_feb = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'units_sold': [170, 180, 95],
    'revenue': [1700, 2700, 1140],
    'month': ['Feb', 'Feb', 'Feb']
})
 
combined = pd.concat([df_jan, df_feb], ignore_index=True)
 
# Launch interactive visualization
walker = pyg.walk(combined)

With PyGWalker, you can drag product to the x-axis and revenue to the y-axis, then split by month to instantly compare revenue trends across periods -- no chart code required. You can create bar charts, scatter plots, line charts, and more just by dragging fields. It is especially useful for verifying that your concatenation worked correctly and that data from different sources aligns as expected.

Install PyGWalker with pip install pygwalker, or try it in Google Colab (opens in a new tab) or Kaggle (opens in a new tab).

FAQ

What is the difference between pandas concat and merge?

pd.concat() stacks DataFrames vertically (adding rows) or horizontally (adding columns) by aligning on the index. pd.merge() joins two DataFrames by matching values in specific columns, like a SQL JOIN. Use concat when your DataFrames have the same columns and you want to combine rows. Use merge when you need to match rows based on a shared key column.

Does pd.concat() modify the original DataFrames?

No. pd.concat() always returns a new DataFrame. The original DataFrames remain unchanged. This is consistent with the pandas design principle that operations return new objects rather than modifying data in place.

How do I concatenate DataFrames with different columns?

Use pd.concat() with the default join='outer' -- it keeps all columns from all DataFrames and fills missing values with NaN. If you only want columns that appear in every DataFrame, set join='inner'. You can also rename columns before concatenating to ensure alignment.

Is pd.concat() faster than DataFrame.append()?

Yes. DataFrame.append() was deprecated in pandas 1.4 and removed in pandas 2.0. It called pd.concat() internally but created a copy on every call. When combining many DataFrames, collecting them in a list and calling pd.concat() once is significantly faster because it allocates memory a single time.

How do I reset the index after concatenation?

Pass ignore_index=True to pd.concat(): pd.concat([df1, df2], ignore_index=True). This replaces the original index values with a new sequential index starting from 0. Alternatively, call .reset_index(drop=True) on the result.

Conclusion

The pandas concat() function is the go-to tool for combining DataFrames that share the same structure. Here are the key takeaways:

  • Vertical concatenation (axis=0) stacks rows and is the most common use case -- ideal for combining monthly files, batch results, or split datasets.
  • Horizontal concatenation (axis=1) places DataFrames side by side, aligning on the index.
  • Use ignore_index=True to get a clean sequential index (recommended in most cases).
  • Use keys to create a hierarchical index that tracks which source each row came from.
  • The join parameter controls how mismatched columns are handled: 'outer' keeps everything, 'inner' keeps only shared columns.
  • Always collect DataFrames in a list and call pd.concat() once, rather than appending in a loop.
  • Use pd.merge() instead when you need SQL-style joins on column values.

Once your data is concatenated, tools like PyGWalker (opens in a new tab) let you visually explore the combined result without writing chart code, making it faster to verify your data pipeline and discover patterns across sources.

📚