Pandas Drop Column: How to Remove Columns from a DataFrame
Updated on
DataFrames from real-world sources rarely arrive with only the columns you need. CSV exports include metadata columns, database queries pull extra fields, and API responses contain nested data you've already flattened. Before any meaningful analysis, you need to remove the irrelevant columns -- and sometimes rename the ones you keep or add new computed columns. Getting this wrong -- accidentally dropping the wrong column or modifying the original DataFrame when you intended to create a copy -- causes data loss bugs that can be difficult to trace.
Pandas provides several methods to remove columns, each suited to different situations. The drop() method is the most versatile, but del, pop(), and column selection offer useful alternatives. This guide covers every approach with clear examples showing when to use each one.
Using df.drop() -- The Standard Approach
The drop() method is the primary way to remove columns. Pass column names and set axis=1 (or use the columns parameter).
Drop a Single Column
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'salary': [50000, 60000, 70000],
'department': ['Engineering', 'Marketing', 'Sales'],
})
# Method 1: Using columns parameter (recommended)
df_clean = df.drop(columns=['salary'])
print(df_clean)
# Method 2: Using axis=1
df_clean = df.drop('salary', axis=1)Drop Multiple Columns
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'email': ['a@test.com', 'b@test.com', 'c@test.com'],
'temp_col': [None, None, None],
'internal_id': ['X1', 'X2', 'X3'],
})
# Drop multiple columns at once
df_clean = df.drop(columns=['temp_col', 'internal_id', 'email'])
print(df_clean)
# id name
# 0 1 Alice
# 1 2 Bob
# 2 3 Charlieinplace Parameter
By default, drop() returns a new DataFrame. Use inplace=True to modify the original:
import pandas as pd
df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})
# Returns new DataFrame (original unchanged)
new_df = df.drop(columns=['b'])
print(df.columns.tolist()) # ['a', 'b', 'c'] (unchanged)
print(new_df.columns.tolist()) # ['a', 'c']
# Modifies original DataFrame
df.drop(columns=['b'], inplace=True)
print(df.columns.tolist()) # ['a', 'c']Handling Missing Columns with errors Parameter
import pandas as pd
df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})
# Default: raises KeyError if column doesn't exist
# df.drop(columns=['d']) # KeyError: "['d'] not found in axis"
# Ignore missing columns
df_clean = df.drop(columns=['b', 'd'], errors='ignore')
print(df_clean.columns.tolist()) # ['a', 'c']Using del Statement
The del statement removes a column in place. It's concise but limited to one column at a time.
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
del df['b']
print(df.columns.tolist()) # ['a', 'c']Limitations: Cannot delete multiple columns at once. Cannot ignore missing columns (raises KeyError). Always modifies in place.
Using df.pop()
pop() removes a column and returns it as a Series. Useful when you need the removed column for further use.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'target': [1, 0],
'feature1': [10, 20],
'feature2': [30, 40],
})
# Extract target column while removing it from DataFrame
y = df.pop('target')
X = df
print(y)
# 0 1
# 1 0
# Name: target, dtype: int64
print(X)
# name feature1 feature2
# 0 Alice 10 30
# 1 Bob 20 40Selecting Columns (Inverse of Dropping)
Sometimes it's easier to select the columns you want rather than listing the ones to remove.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'salary': [50000, 60000],
'dept': ['Eng', 'Mkt'],
'internal_id': ['X1', 'X2'],
})
# Keep only specific columns
df_clean = df[['name', 'age', 'salary']]
# Drop columns by selecting everything except them
df_clean = df.loc[:, df.columns != 'internal_id']
# Keep columns matching a condition
df_numeric = df.select_dtypes(include='number')
print(df_numeric)
# age salary
# 0 25 50000
# 1 30 60000Drop Columns by Pattern or Condition
Drop Columns by Name Pattern
import pandas as pd
df = pd.DataFrame({
'name': ['Alice'], 'age': [25],
'temp_1': [None], 'temp_2': [None],
'internal_flag': [True],
})
# Drop columns starting with 'temp_'
cols_to_drop = [c for c in df.columns if c.startswith('temp_')]
df_clean = df.drop(columns=cols_to_drop)
print(df_clean.columns.tolist()) # ['name', 'age', 'internal_flag']
# Drop columns containing 'internal'
cols_to_drop = [c for c in df.columns if 'internal' in c]
df_clean = df.drop(columns=cols_to_drop)
# Using filter() to keep matching columns
df_temps = df.filter(like='temp') # Keeps only columns containing 'temp'
df_no_temps = df.drop(columns=df.filter(like='temp').columns)Drop Columns by Data Type
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'score': [95.5, 87.3],
'active': [True, False],
})
# Drop all non-numeric columns
df_numeric = df.select_dtypes(include='number')
# Drop all object (string) columns
df_no_strings = df.select_dtypes(exclude='object')
print(df_no_strings.columns.tolist()) # ['age', 'score', 'active']Drop Columns and Reorder the Rest
After dropping columns, you may want to reorder the remaining columns to match a preferred layout.
Drop Columns with Too Many Missing Values
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': [1, 2, 3, 4, 5],
'b': [1, np.nan, np.nan, np.nan, np.nan],
'c': [1, 2, np.nan, 4, 5],
'd': [np.nan, np.nan, np.nan, np.nan, np.nan],
})
# Drop columns where more than 50% of values are missing
threshold = len(df) * 0.5
df_clean = df.dropna(axis=1, thresh=int(threshold))
print(df_clean.columns.tolist()) # ['a', 'c']Method Comparison
| Method | Returns | In-Place | Multiple Cols | Missing Col Handling |
|---|---|---|---|---|
df.drop(columns=...) | New DataFrame | Optional (inplace) | Yes | errors='ignore' |
del df[col] | Nothing | Always | No (one at a time) | Raises KeyError |
df.pop(col) | Removed Series | Always | No (one at a time) | Raises KeyError |
df[cols_to_keep] | New DataFrame | No | Yes (inverse) | Raises KeyError |
df.select_dtypes() | New DataFrame | No | By dtype | N/A |
Visualizing Your Cleaned DataFrame
After dropping columns and cleaning your data, PyGWalker (opens in a new tab) provides an interactive Tableau-style interface to explore the cleaned DataFrame directly in Jupyter:
import pygwalker as pyg
# After cleaning your DataFrame
walker = pyg.walk(df_clean)This lets you drag-and-drop remaining columns to build charts without writing any plotting code.
FAQ
How do I drop a column in pandas?
Use df.drop(columns=['column_name']) to remove a column and return a new DataFrame. For in-place removal, add inplace=True. You can also use del df['column_name'] for a quick in-place delete, or df.pop('column_name') to remove and return the column as a Series.
How do I drop multiple columns at once?
Pass a list of column names to df.drop(columns=['col1', 'col2', 'col3']). This removes all specified columns in a single operation and returns a new DataFrame.
How do I drop columns conditionally (by pattern or data type)?
For name patterns, use list comprehension: df.drop(columns=[c for c in df.columns if c.startswith('temp_')]). For data types, use df.select_dtypes(exclude='object') to drop string columns, or df.select_dtypes(include='number') to keep only numeric columns.
What is the difference between drop() and del for removing columns?
df.drop() returns a new DataFrame by default, can handle multiple columns at once, and has an errors='ignore' option for missing columns. del df[col] always modifies in place, works on one column at a time, and raises KeyError if the column doesn't exist.
How do I drop columns with missing values?
Use df.dropna(axis=1) to drop any column that has at least one NaN. Use df.dropna(axis=1, thresh=n) to keep only columns with at least n non-null values. For custom thresholds, filter by the percentage of nulls: df.loc[:, df.isnull().mean() < 0.5] keeps columns with less than 50% missing data.
Conclusion
For most situations, df.drop(columns=[...]) is the right choice -- it's explicit, handles multiple columns, and returns a new DataFrame by default. Use del for quick in-place single-column removal, pop() when you need the removed column, and column selection or select_dtypes() when it's easier to specify what you want to keep rather than what to remove. To remove duplicate rows instead of columns, see pandas drop_duplicates(). To filter rows by condition, use boolean indexing or the .query() method.
Related Guides
- Rename Columns in Pandas -- rename columns instead of dropping and recreating them
- Add a Column to a DataFrame -- add new columns after dropping unwanted ones
- Reorder DataFrame Columns -- rearrange column order after cleanup
- Filter Rows by Condition -- remove rows instead of columns