How to Rename Column in Pandas: Clearly Explained
Updated on
Data analysis is a crucial task in today's data-driven world. It requires cleaning, organizing, and transforming raw data into an understandable and meaningful format. One of the most fundamental tasks in data analysis is column renaming as it makes the data more informative and understandable.
In this tutorial, we will explore how to rename columns in Pandas DataFrame by using different methods. We will discuss the best practices, tips, and tricks to make your data analysis more clear and concise. Let's get started!
Want to quickly create Data Visualization from Python Pandas Dataframe with No code?
PyGWalker is a Python library for Exploratory Data Analysis with Visualization. PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a tableau-alternative User Interface for visual exploration.
What is DataFrame Rename Column?
Before diving deep into the coding part, let's first understand what is column renaming in Pandas DataFrame and why it is important.
In a Pandas DataFrame, columns are named as unique identifiers that distinguish one column from another. Sometimes these unique identifiers are not informative or inconsistent with the data, which could lead to confusion and misinterpretation. In such cases, column renaming helps to make it more informative and understandable.
Column renaming is a process of changing the name of one or more columns in a Pandas DataFrame. It is done either by selecting the column labels or their index. It improves the readability of the data and helps to understand the relationships between different columns.
How to Rename a Column in Pandas DataFrame?
Pandas provides several ways to rename columns in DataFrame. We will explore the most commonly used methods and best practices to rename columns.
Renaming a Single Column
Let's start with the most basic method of renaming a single column in Pandas DataFrame. We will use the rename
method to do that.
# Create a sample DataFrame
import pandas as pd
data = {'Name': ['John', 'Alex', 'Peter'],
'Age': [25, 24, 28],
'Gender': ['Male', 'Male', 'Male']}
df = pd.DataFrame(data)
# Rename the 'Age' column to 'Years'
df = df.rename(columns={'Age': 'Years'})
# Print the DataFrame
print(df)
Output:
Name Years Gender
0 John 25 Male
1 Alex 24 Male
2 Peter 28 Male
Here, we have created a sample DataFrame with columns Name
, Age
, and Gender
. We have used the rename
method to change the name of column Age
to Years
. The rename
method takes a dictionary as input where the keys are the old column names, and values are the new column names.
Renaming Multiple Columns
Renaming a single column is easy, but what if we want to rename multiple columns at once? In such cases, we can use the same rename
method with a dictionary of old and new column names.
# Create a sample DataFrame
import pandas as pd
data = {'Name': ['John', 'Alex', 'Peter'],
'Age': [25, 24, 28],
'Department': ['IT', 'HR', 'Marketing']}
df = pd.DataFrame(data)
# Rename the 'Age' and 'Department' columns
df = df.rename(columns={'Age': 'Years', 'Department': 'Dept'})
# Print the DataFrame
print(df)
Output:
Name Years Dept
0 John 25 IT
1 Alex 24 HR
2 Peter 28 Marketing
Here, we have renamed two columns, Age
to Years
and Department
to Dept
, using the rename
method with a dictionary of old and new column names.
Renaming Columns using set_axis
Method
Another way to rename columns in Pandas DataFrame is by using the set_axis
method. It is a flexible and convenient method that can be used to rename columns by selecting their index or label.
# Create a sample DataFrame
import pandas as pd
data = {'Name': ['John', 'Alex', 'Peter'],
'Age': [25, 24, 28],
'Gender': ['Male', 'Male', 'Male']}
df = pd.DataFrame(data)
# Rename the 'Age' and 'Gender' columns by index
df.columns = df.columns.set_axis(['a', 'Years', 'b'], axis=1, inplace=False)
# Print the DataFrame
print(df)
Output:
Name Years b
0 John 25 Male
1 Alex 24 Male
2 Peter 28 Male
Here, we have used the set_axis
method to rename the columns with index positions. The method takes three parameters - labels, axis, and inplace. We have set the labels to the new column names and the axis to 1, which represents columns. The inplace parameter is set to False to return a new DataFrame.
Renaming Columns using List Comprehension
We can also rename columns in Pandas DataFrame using list comprehension. It is a simple and elegant method that allows renaming multiple columns at once.
# Create a sample DataFrame
import pandas as pd
data = {'Name': ['John', 'Alex', 'Peter'],
'Age': [25, 24, 28],
'Department': ['IT', 'HR', 'Marketing']}
df = pd.DataFrame(data)
# Rename the 'Age' and 'Department' columns using list comprehension
df.columns = [col.replace('_', ' ').title() for col in df.columns]
# Print the DataFrame
print(df)
Output:
Name Age Department
0 John 25 IT
1 Alex 24 HR
2 Peter 28 Marketing
Here, we have used the list comprehension to rename the columns by replacing the underscores with spaces and converting the first letter to uppercase using the title()
method.
DataFrame Rename by Index
Renaming a column by index is also possible in Pandas DataFrame. We can use the rename
method with a dictionary of old and new column index positions.
# Create a sample DataFrame
import pandas as pd
data = {'Name': ['John', 'Alex', 'Peter'],
'Age': [25, 24, 28],
'Department': ['IT', 'HR', 'Marketing']}
df = pd.DataFrame(data)
# Rename the '2' to 'Dept' column by index
df = df.rename(columns={2: 'Dept'})
# Print the DataFrame
print(df)
Output:
Name Age Dept
0 John 25 IT
1 Alex 24 HR
2 Peter 28 Marketing
Here, we have used the rename
method to rename the column with index position 2 to Dept
.
DataFrame Rename Column with List
We can also rename columns in Pandas DataFrame by selecting a list of column names. Let's see how it is done.
# Create a sample DataFrame
import pandas as pd
data = {'Name': ['John', 'Alex', 'Peter'],
'Age': [25, 24, 28],
'Department': ['IT', 'HR', 'Marketing']}
df = pd.DataFrame(data)
# Rename the 'Name' and 'Department' columns using a list of column names
df.columns = ['ID', 'Years', 'Dept']
# Print the DataFrame
print(df)
Output:
ID Years Dept
0 John 25 IT
1 Alex 24 HR
2 Peter 28 Marketing
Here, we have used a list of column names to rename the columns Name
and Department
to ID
and Dept
respectively.
Conclusion
In this tutorial, we have learned how to rename columns in Pandas DataFrame using different methods - rename
method, set_axis
method, list comprehension, renaming by index, and renaming with a list. We have also explored the best practices, tips, and tricks to make your data analysis more organized and informative.
Column renaming is a critical step in data analysis as it enhances the readability of the data and helps to understand the relationships between different columns. By using the methods discussed above, you can easily rename columns in Pandas DataFrame and make your data analysis more effective and efficient.
We hope this tutorial was helpful and informative. Happy coding!
Links:
- Dict to DataFrame in Pandas
- Add a Column to a DataFrame in Pandas
- Creating a DataFrame in R
- Sort DataFrame in Pandas
- Add a Row to a DataFrame in Pandas
- Creating an Empty DataFrame in Pandas
Frequently Asked Questions
-
How can you rename a column in a DataFrame?
To rename a column in a DataFrame, you can use the
rename()
method in pandas. Specify the old column name and the new column name using a dictionary or a mapping. This method allows you to rename a single column or multiple columns at once. -
How to rename column by column index in Pandas?
In pandas, you can rename a column by column index using the
rename()
method with thecolumns
parameter. Pass a dictionary where the keys are the current column indices and the values are the new column names. This method allows you to rename columns based on their position in the DataFrame. -
How do you rename multiple columns in a DataFrame?
To rename multiple columns in a DataFrame, you can use the
rename()
method with thecolumns
parameter. Pass a dictionary where the keys are the current column names and the values are the new column names. This method allows you to rename multiple columns simultaneously and provides flexibility in renaming columns based on specific criteria.