Skip to content

How to Drop a Column in Pandas DataFrame

Updated on

As a data scientist, one of the most common operations you perform is manipulating data in a DataFrame. One of the frequent tasks that come up in your data processing workflow is dropping columns that are not needed for analysis. In this tutorial, we will look at how to drop a column in Pandas DataFrame. We will cover different methods of removing columns based on column name, index, and multiple columns.

Want to quickly create Data Visualization from Python Pandas Dataframe with No code?

PyGWalker is a Python library for Exploratory Data Analysis with Visualization. PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a tableau-alternative User Interface for visual exploration.

PyGWalker for Data visualization (opens in a new tab)

Pandas DataFrame Overview

Before diving into the details of dropping columns, let’s have an overview of the Pandas DataFrame.

A DataFrame is a two-dimensional table-like data structure with rows and columns. Each column in a DataFrame is a Series. A Series is a one-dimensional data structure that holds an array of values with a label called an index. In addition, a DataFrame can have row and column indices for fast and efficient data access. -13 Pandas DataFrame is a powerful tool for handling and manipulating data in Python. It allows you to perform complex data analysis, data cleaning, data transformation, and data visualization tasks.

Dropping a Column in Pandas DataFrame

Now let us get started with the process of dropping a column in Pandas DataFrame. There are several ways to drop a column in a DataFrame, depending on the requirement. We will look at some of the popular methods below.

Drop a Column Using the drop Method

The easiest method to remove a column from a DataFrame is by using the drop method. You can use the drop method with the parameter axis=1 to indicate that you want to remove a column.

# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# drop the column 'city'
df = df.drop('city', axis=1)
print(df.head())

Output:

     name  age
0    Alex   20
1     Bob   25
2  Clarke   19
3   David   18

In the above example, we created a sample DataFrame with three columns named name, age, and city. We used the drop method with the parameter axis=1 to remove the column city. We then printed the updated DataFrame that only has two columns, name and age.

Drop a Column Using the Subsetting Method

Another way to drop a column from a DataFrame is to use the subsetting method [] with the del statement. The del statement removes the column directly from the DataFrame object.

# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# remove the column 'city'
del df['city']
print(df.head())

Output:

     name  age
0    Alex   20
1     Bob   25
2  Clarke   19
3   David   18

In the above example, we created a sample DataFrame with three columns named name, age, and city. We used the subsetting method [] with the del statement to remove the column city. We then printed the updated DataFrame that only has two columns, name and age.

Drop Multiple Columns

Sometimes it is necessary to remove multiple columns from a DataFrame. You can use the drop method with a list of column names to remove multiple columns.

# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo'], 'occupation': ['Engineer', 'Doctor', 'Artist', 'Lawyer']}
df = pd.DataFrame(data)
# drop the columns 'city' and 'occupation'
df = df.drop(['city', 'occupation'], axis=1)
print(df.head())

Output:

     name  age
0    Alex   20
1     Bob   25
2  Clarke   19
3   David   18

In the above example, we created a sample DataFrame with four columns named name, age, city, and occupation. We used the drop method with a list of column names to remove the columns city and occupation. We then printed the updated DataFrame that only has two columns, name and age.

Drop Columns Using a Column Index

You can also drop a column from a DataFrame using the index of the column. To do this, you can use the drop method with the parameter columns and specify the index of the column to remove.

# create a sample DataFrame
import pandas as pd
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': ['New York', 'Paris', 'London', 'Tokyo']}
df = pd.DataFrame(data)
# remove the column at index 2, i.e., 'city'
df = df.drop(df.columns[2], axis=1)
print(df.head())

Output:

     name  age
0    Alex   20
1     Bob   25
2  Clarke   19
3   David   18

In the above example, we created a sample DataFrame with three columns named name, age, and city. We used the drop method with the parameter columns and specified the index of the column to remove, i.e., 2. We then printed the updated DataFrame that only has two columns, name and age.

Drop Columns Based on a Condition

You can also remove columns based on some conditions using the drop method. For example, you can remove all columns that have all NaN values.

# create a sample DataFrame with a column having all NaN values
import pandas as pd
import numpy as np
data = {'name': ['Alex', 'Bob', 'Clarke', 'David'], 'age': [20, 25, 19, 18],'city': [np.nan, np.nan, np.nan, np.nan], 'occupation': ['Engineer', 'Doctor', 'Artist', 'Lawyer']}
df = pd.DataFrame(data)
# delete the columns that have all NaN values
df = df.dropna(how='all', axis=1)
print(df.head())

Output:

     name  age  occupation
0    Alex   20    Engineer
1     Bob   25      Doctor
2  Clarke   19      Artist
3   David   18      Lawyer

In the above example, we created a sample DataFrame with four columns named name, age, city, and occupation. We set the values in the city column to NaN. We used the dropna method with the parameter how='all' and axis=1 to remove the columns that have all NaN values. We then printed the updated DataFrame that only has three columns, name, age, and occupation.

Conclusion

Dropping a column from a Pandas DataFrame is an essential operation that you need to master as a data scientist. In this tutorial, we covered different methods of removing columns based on column name, index, and multiple columns. We hope this tutorial has helped you in optimizing your workflow and improving your data operations with Pandas DataFrame.

Frequently Asked Questions

  1. How to drop a column in a Python DataFrame?

    To drop a column in a Python DataFrame, you can use the drop() method and specify the column name along with the axis parameter set to 1. This will remove the specified column from the DataFrame. Alternatively, you can use the del keyword followed by the column name to delete the column in place.

  2. Can multiple columns be dropped simultaneously in a Python DataFrame?

    Yes, multiple columns can be dropped simultaneously in a Python DataFrame. You can pass a list of column names to the drop() method or use the drop() method multiple times with different column names specified each time. This will remove all the specified columns from the DataFrame.

  3. Is it possible to drop columns based on certain conditions in a Python DataFrame?

    Yes, it is possible to drop columns based on certain conditions in a Python DataFrame. You can use boolean indexing or the loc indexer to select the columns that meet the desired condition and then use the drop() method to remove those columns from the DataFrame. This allows you to selectively drop columns based on specific criteria.