Skip to content

Pandas Drop Column: How to Remove Columns from a DataFrame

Updated on

DataFrames from real-world sources rarely arrive with only the columns you need. CSV exports include metadata columns, database queries pull extra fields, and API responses contain nested data you've already flattened. Before any meaningful analysis, you need to remove the irrelevant columns -- and sometimes rename the ones you keep or add new computed columns. Getting this wrong -- accidentally dropping the wrong column or modifying the original DataFrame when you intended to create a copy -- causes data loss bugs that can be difficult to trace.

Pandas provides several methods to remove columns, each suited to different situations. The drop() method is the most versatile, but del, pop(), and column selection offer useful alternatives. This guide covers every approach with clear examples showing when to use each one.

📚

Using df.drop() -- The Standard Approach

The drop() method is the primary way to remove columns. Pass column names and set axis=1 (or use the columns parameter).

Drop a Single Column

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000, 60000, 70000],
    'department': ['Engineering', 'Marketing', 'Sales'],
})
 
# Method 1: Using columns parameter (recommended)
df_clean = df.drop(columns=['salary'])
print(df_clean)
 
# Method 2: Using axis=1
df_clean = df.drop('salary', axis=1)

Drop Multiple Columns

import pandas as pd
 
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'email': ['a@test.com', 'b@test.com', 'c@test.com'],
    'temp_col': [None, None, None],
    'internal_id': ['X1', 'X2', 'X3'],
})
 
# Drop multiple columns at once
df_clean = df.drop(columns=['temp_col', 'internal_id', 'email'])
print(df_clean)
#    id     name
# 0   1    Alice
# 1   2      Bob
# 2   3  Charlie

inplace Parameter

By default, drop() returns a new DataFrame. Use inplace=True to modify the original:

import pandas as pd
 
df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})
 
# Returns new DataFrame (original unchanged)
new_df = df.drop(columns=['b'])
print(df.columns.tolist())      # ['a', 'b', 'c'] (unchanged)
print(new_df.columns.tolist())  # ['a', 'c']
 
# Modifies original DataFrame
df.drop(columns=['b'], inplace=True)
print(df.columns.tolist())  # ['a', 'c']

Handling Missing Columns with errors Parameter

import pandas as pd
 
df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})
 
# Default: raises KeyError if column doesn't exist
# df.drop(columns=['d'])  # KeyError: "['d'] not found in axis"
 
# Ignore missing columns
df_clean = df.drop(columns=['b', 'd'], errors='ignore')
print(df_clean.columns.tolist())  # ['a', 'c']

Using del Statement

The del statement removes a column in place. It's concise but limited to one column at a time.

import pandas as pd
 
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
 
del df['b']
print(df.columns.tolist())  # ['a', 'c']

Limitations: Cannot delete multiple columns at once. Cannot ignore missing columns (raises KeyError). Always modifies in place.

Using df.pop()

pop() removes a column and returns it as a Series. Useful when you need the removed column for further use.

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'target': [1, 0],
    'feature1': [10, 20],
    'feature2': [30, 40],
})
 
# Extract target column while removing it from DataFrame
y = df.pop('target')
X = df
 
print(y)
# 0    1
# 1    0
# Name: target, dtype: int64
 
print(X)
#     name  feature1  feature2
# 0  Alice        10        30
# 1    Bob        20        40

Selecting Columns (Inverse of Dropping)

Sometimes it's easier to select the columns you want rather than listing the ones to remove.

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30],
    'salary': [50000, 60000],
    'dept': ['Eng', 'Mkt'],
    'internal_id': ['X1', 'X2'],
})
 
# Keep only specific columns
df_clean = df[['name', 'age', 'salary']]
 
# Drop columns by selecting everything except them
df_clean = df.loc[:, df.columns != 'internal_id']
 
# Keep columns matching a condition
df_numeric = df.select_dtypes(include='number')
print(df_numeric)
#    age  salary
# 0   25   50000
# 1   30   60000

Drop Columns by Pattern or Condition

Drop Columns by Name Pattern

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice'], 'age': [25],
    'temp_1': [None], 'temp_2': [None],
    'internal_flag': [True],
})
 
# Drop columns starting with 'temp_'
cols_to_drop = [c for c in df.columns if c.startswith('temp_')]
df_clean = df.drop(columns=cols_to_drop)
print(df_clean.columns.tolist())  # ['name', 'age', 'internal_flag']
 
# Drop columns containing 'internal'
cols_to_drop = [c for c in df.columns if 'internal' in c]
df_clean = df.drop(columns=cols_to_drop)
 
# Using filter() to keep matching columns
df_temps = df.filter(like='temp')  # Keeps only columns containing 'temp'
df_no_temps = df.drop(columns=df.filter(like='temp').columns)

Drop Columns by Data Type

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30],
    'score': [95.5, 87.3],
    'active': [True, False],
})
 
# Drop all non-numeric columns
df_numeric = df.select_dtypes(include='number')
 
# Drop all object (string) columns
df_no_strings = df.select_dtypes(exclude='object')
print(df_no_strings.columns.tolist())  # ['age', 'score', 'active']

Drop Columns and Reorder the Rest

After dropping columns, you may want to reorder the remaining columns to match a preferred layout.

Drop Columns with Too Many Missing Values

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'a': [1, 2, 3, 4, 5],
    'b': [1, np.nan, np.nan, np.nan, np.nan],
    'c': [1, 2, np.nan, 4, 5],
    'd': [np.nan, np.nan, np.nan, np.nan, np.nan],
})
 
# Drop columns where more than 50% of values are missing
threshold = len(df) * 0.5
df_clean = df.dropna(axis=1, thresh=int(threshold))
print(df_clean.columns.tolist())  # ['a', 'c']

Method Comparison

MethodReturnsIn-PlaceMultiple ColsMissing Col Handling
df.drop(columns=...)New DataFrameOptional (inplace)Yeserrors='ignore'
del df[col]NothingAlwaysNo (one at a time)Raises KeyError
df.pop(col)Removed SeriesAlwaysNo (one at a time)Raises KeyError
df[cols_to_keep]New DataFrameNoYes (inverse)Raises KeyError
df.select_dtypes()New DataFrameNoBy dtypeN/A

Visualizing Your Cleaned DataFrame

After dropping columns and cleaning your data, PyGWalker (opens in a new tab) provides an interactive Tableau-style interface to explore the cleaned DataFrame directly in Jupyter:

import pygwalker as pyg
 
# After cleaning your DataFrame
walker = pyg.walk(df_clean)

This lets you drag-and-drop remaining columns to build charts without writing any plotting code.

FAQ

How do I drop a column in pandas?

Use df.drop(columns=['column_name']) to remove a column and return a new DataFrame. For in-place removal, add inplace=True. You can also use del df['column_name'] for a quick in-place delete, or df.pop('column_name') to remove and return the column as a Series.

How do I drop multiple columns at once?

Pass a list of column names to df.drop(columns=['col1', 'col2', 'col3']). This removes all specified columns in a single operation and returns a new DataFrame.

How do I drop columns conditionally (by pattern or data type)?

For name patterns, use list comprehension: df.drop(columns=[c for c in df.columns if c.startswith('temp_')]). For data types, use df.select_dtypes(exclude='object') to drop string columns, or df.select_dtypes(include='number') to keep only numeric columns.

What is the difference between drop() and del for removing columns?

df.drop() returns a new DataFrame by default, can handle multiple columns at once, and has an errors='ignore' option for missing columns. del df[col] always modifies in place, works on one column at a time, and raises KeyError if the column doesn't exist.

How do I drop columns with missing values?

Use df.dropna(axis=1) to drop any column that has at least one NaN. Use df.dropna(axis=1, thresh=n) to keep only columns with at least n non-null values. For custom thresholds, filter by the percentage of nulls: df.loc[:, df.isnull().mean() < 0.5] keeps columns with less than 50% missing data.

Conclusion

For most situations, df.drop(columns=[...]) is the right choice -- it's explicit, handles multiple columns, and returns a new DataFrame by default. Use del for quick in-place single-column removal, pop() when you need the removed column, and column selection or select_dtypes() when it's easier to specify what you want to keep rather than what to remove. To remove duplicate rows instead of columns, see pandas drop_duplicates(). To filter rows by condition, use boolean indexing or the .query() method.

Related Guides

📚