How to concat two Pandas DataFrames: Explained!
Updated on
Pandas is one of the most popular open-source tools for Data Science. It provides DataFrame structures that offer high-level performance for tabular data manipulation, cleaning, analysis, and visualization. In Pandas, it is common to have to merge two or more DataFrames to achieve some kind of analysis on the data. We can achieve this by combining the DataFrames either horizontally or vertically. Here, we will illustrate and discuss both methods in detail.
Want to quickly create Data Visualization from Python Pandas Dataframe with No code?
PyGWalker is a Python library for Exploratory Data Analysis with Visualization. PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a tableau-alternative User Interface for visual exploration.
Concatenating Two DataFrames Vertically
In Pandas, two DataFrames can be concatenated using the concat()
method. To demonstrate this, we will start by creating two sample DataFrames. In the first sample DataFrame, let's say we have information on some employees in a company:
import pandas as pd
# Creating DataFrame 1
df1 = pd.DataFrame({
'Name': ['John', 'Jack', 'Steve', 'Sarah'],
'Age': [24, 32, 19, 29],
'Gender': ['M', 'M', 'M', 'F']
})
This will create a DataFrame that looks like this:
Name Age Gender
0 John 24 M
1 Jack 32 M
2 Steve 19 M
3 Sarah 29 F
Let's say we have another DataFrame that contains information on the departments in the company:
# Creating DataFrame 2
df2 = pd.DataFrame({
'Department': ['Marketing', 'Sales', 'Human Resources'],
'Employees': [15, 12, 10],
})
This will create a DataFrame that looks like this:
Department Employees
0 Marketing 15
1 Sales 12
2 Human Resources 10
Now, we can use the concat()
method to combine the two DataFrames vertically:
# Concatenating vertically
df3 = pd.concat([df1, df2], axis=0)
Here, the axis=0
parameter denotes that we want to concatenate the DataFrames by stacking them on top of each other (i.e., vertically). After concatenation, we get the following output:
Name Age Gender Department Employees
0 John 24 M NaN NaN
1 Jack 32 M NaN NaN
2 Steve 19 M NaN NaN
3 Sarah 29 F NaN NaN
0 NaN NaN NaN Marketing 15
1 NaN NaN NaN Sales 12
2 NaN NaN NaN Human Resources 10
We can see that the two DataFrames are concatenated as desired, but there are some NaN
(null) values in the new DataFrame where the column names don't match. We can drop those rows if they don't fit our data analysis, or we can provide some default values to fill the null values.
Concatenating Two DataFrames Horizontally
We can also concatenate two DataFrames horizontally (i.e., combine them side-by-side) using the concat()
method, like so:
# Concatenating horizontally
df4 = pd.concat([df1, df2], axis=1)
Here, the axis=1
parameter denotes that we want to concatenate the DataFrames by putting them beside each other (i.e., horizontally). After concatenation, we get the following output:
Name Age Gender Department Employees
0 John 24 M Marketing 15
1 Jack 32 M Sales 12
2 Steve 19 M Human Resources 10
3 Sarah 29 F NaN NaN
We can see that the two DataFrames are concatenated horizontally as desired. However, we have some NaN
(null) values in the new DataFrame where the row names don't match. Again, we can drop those rows if they don't fit our data analysis, or we can provide some default values to fill the null values.
Concatenating DataFrames with Different Columns
What happens if the two DataFrames to be concatenated have different columns? In such cases, Pandas will identify the non-matching column names and add them to the new DataFrame as separate columns. Let's illustrate this with an example:
# Creating DataFrame with different columns
df5 = pd.DataFrame({
'Employee Name': ['John', 'Jack', 'Steve', 'Sarah'],
'Employee Age': [24, 32, 19, 29],
'Job Title': ['Manager', 'Assistant', 'Clerk', 'Executive']
})
This will create a DataFrame that looks like this:
Employee Name Employee Age Job Title
0 John 24 Manager
1 Jack 32 Assistant
2 Steve 19 Clerk
3 Sarah 29 Executive
Now we can concatenate df1
and df5
:
# Concatenate df1 and df5
df6 = pd.concat([df1, df5], axis=1)
After concatenation, we get the following output:
Name Age Gender Employee Name Employee Age Job Title
0 John 24 M John 24 Manager
1 Jack 32 M Jack 32 Assistant
2 Steve 19 M Steve 19 Clerk
3 Sarah 29 F Sarah 29 Executive
As we can see, Pandas concatenates the two DataFrames by adding the columns that don't match as separate columns in the new DataFrame.
Ignoring the Index When Concatenating DataFrames Horizontally
When concatenating two DataFrames horizontally, the resulting DataFrame retains the original indices of the two DataFrames. This can cause problems while working with this new DataFrame. Thus, it may be necessary in some cases to ignore the index while concatenating horizontally. We can accomplish this by setting the ignore_index
parameter to True
while concatenating:
# Concatenate df1 and df2, ignoring the index
df7 = pd.concat([df1, df2], axis=1, ignore_index=True)
After concatenation, we get the following output:
0 1 2 3 4
0 John 24 M Marketing 15
1 Jack 32 M Sales 12
2 Steve 19 M Human Resources 10
3 Sarah 29 F NaN NaN
Now the new DataFrame has a new index, which is more convenient for further data analysis.
Conclusion
Concatenating Pandas DataFrames vertically or horizontally provides us with an efficient way to merge data for analysis. With this guide, we have covered the basics of concatenation using the concat()
method. With some practice and more knowledge of Pandas DataFrames, you'll be able to take large amounts of data and turn them into insights. If you're interested in learning more about Pandas, check out our Pandas tutorial pages below: