Pandas fillna(): Handle Missing Values in DataFrames

Q: What is the difference between fillna() and dropna()?

fillna() replaces missing values with a value you specify, keeping all rows intact. dropna() removes entire rows or columns that contain missing values. Use fillna() when you have a reasonable replacement value and want to preserve your row count. Use dropna() when the missing rows are few and imputation would introduce unacceptable bias.

Q: Can I fill NaN values with the mean of a column?

Yes. Use df['column'] = df['column'].fillna(df['column'].mean()). This computes the mean from non-missing values and fills every NaN in that column with the result. For skewed data, median() is often a better choice because it is less affected by extreme outliers.

Q: What does the limit parameter do in fillna()?

The limit parameter caps the maximum number of consecutive NaN values that get filled. For example, df.fillna(method='ffill', limit=2) will forward-fill at most 2 consecutive gaps. Any longer sequence of missing values will be only partially filled, leaving the remaining gaps as NaN.

Q: How do I fill NaN with different values for different columns?

Pass a dictionary to fillna() where keys are column names and values are the fill values: df.fillna({'age': 0, 'name': 'Unknown', 'salary': df['salary'].median()}). Each column gets its own fill value, and columns not listed in the dictionary are left unchanged.

Q: Does fillna() change the original DataFrame?

No, by default fillna() returns a new DataFrame and the original remains unchanged. To modify the original, either use assignment (df = df.fillna(0)) or pass inplace=True. The assignment approach is recommended because it works with method chaining and makes the data flow explicit.

Name: Soren Atelier

Updated on 2/10/2026

Missing values are the silent saboteur of data analysis. A single NaN hiding in a critical column can cause an aggregation to return NaN, a machine learning model to throw an error at training time, or a dashboard chart to render a blank gap where a trend line should be. Real-world datasets almost always contain gaps -- sensor readings drop out, survey respondents skip questions, API responses return null fields, and CSV imports arrive with empty cells. The question is never whether you will encounter missing data, but how you will handle it. For a broader overview of missing data strategies, see the pandas missing values guide.

The pandas fillna() method is the primary tool for replacing missing values with something meaningful. This guide covers every parameter, demonstrates common fill strategies (scalar, dictionary, forward fill, backward fill, mean/median/mode), compares fillna() against dropna() and interpolate(), and shows how to chain these operations into a clean data pipeline. Every code example is copy-ready with expected output.

Detecting Missing Values Before Filling

Before filling anything, you need to know where the gaps are. Pandas provides three detection functions:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob', None, 'Diana', 'Eve'],
    'age': [28, np.nan, 35, np.nan, 42],
    'salary': [55000, 62000, np.nan, 48000, np.nan]
})
 
# isna() returns True for missing values (alias: isnull())
print(df.isna())

Output:

    name    age  salary
0  False  False   False
1  False   True   False
2   True  False    True
3  False   True   False
4  False  False    True

Quick summary of missing counts

# Count missing values per column
print(df.isna().sum())

Output:

name      1
age       2
salary    2
dtype: int64

notna() for the inverse check

# notna() returns True for non-missing values
print(df.notna().sum())

Output:

name      4
age       3
salary    3
dtype: int64

Function	Returns True when	Alias
`isna()`	Value is `NaN`, `None`, or `NaT`	`isnull()`
`notna()`	Value is not missing	`notnull()`

These functions work on both DataFrames and individual Series. Use them to audit your data before deciding on a fill strategy.

Basic fillna() with a Scalar Value

The simplest use of fillna() replaces every NaN in the DataFrame with a single value:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Gizmo'],
    'price': [19.99, np.nan, 29.99],
    'stock': [100, 50, np.nan]
})
 
print("Before:")
print(df)
 
df_filled = df.fillna(0)
 
print("\nAfter fillna(0):")
print(df_filled)

Output:

Before:
  product  price  stock
0  Widget  19.99  100.0
1  Gadget    NaN   50.0
2   Gizmo  29.99    NaN

After fillna(0):
  product  price  stock
0  Widget  19.99  100.0
1  Gadget   0.00   50.0
2   Gizmo  29.99    0.0

This works, but filling a price column with 0 is misleading -- it suggests the product is free. For string columns, you might fill with "Unknown". The key is choosing a fill value that makes semantic sense for each column.

Full Method Signature

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None)

Parameter	Type	Default	Description
`value`	scalar, dict, Series, or DataFrame	`None`	The value to fill missing entries with
`method`	`'ffill'`, `'bfill'`, or `None`	`None`	Propagation method for filling gaps
`axis`	0 or 1	`None`	Fill along rows (0) or columns (1)
`inplace`	`bool`	`False`	If `True`, modifies the DataFrame in place
`limit`	`int`	`None`	Maximum number of consecutive NaNs to fill

fillna() with a Dictionary: Different Values per Column

In most real datasets, each column represents a different type of measurement, and a single fill value does not make sense everywhere. Pass a dictionary to fillna() to specify per-column fill values:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'name': ['Alice', None, 'Charlie', 'Diana'],
    'age': [28, 34, np.nan, 45],
    'department': ['Engineering', 'Sales', None, 'Marketing'],
    'salary': [75000, np.nan, 68000, np.nan]
})
 
fill_values = {
    'name': 'Unknown',
    'age': df['age'].median(),
    'department': 'Unassigned',
    'salary': df['salary'].mean()
}
 
df_filled = df.fillna(fill_values)
print(df_filled)

Output:

      name   age   department   salary
0    Alice  28.0  Engineering  75000.0
1  Unknown  34.0        Sales  71500.0
2  Charlie  34.0   Unassigned  68000.0
3    Diana  45.0    Marketing  71500.0

This is the recommended approach for production data pipelines because it gives you explicit control over what each column receives.

Forward Fill (ffill) and Backward Fill (bfill)

Time-series data and ordered datasets often benefit from propagation-based filling. Forward fill carries the last known value forward; backward fill takes the next known value backward.

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'date': pd.date_range('2026-01-01', periods=7, freq='D'),
    'temperature': [22.1, np.nan, np.nan, 24.5, np.nan, 26.0, np.nan]
})
 
print("Original:")
print(df)
 
print("\nForward fill (ffill):")
print(df.fillna(method='ffill'))
 
print("\nBackward fill (bfill):")
print(df.fillna(method='bfill'))

Output:

Original:
        date  temperature
0 2026-01-01         22.1
1 2026-01-02          NaN
2 2026-01-03          NaN
3 2026-01-04         24.5
4 2026-01-05          NaN
5 2026-01-06         26.0
6 2026-01-07          NaN

Forward fill (ffill):
        date  temperature
0 2026-01-01         22.1
1 2026-01-02         22.1
2 2026-01-03         22.1
3 2026-01-04         24.5
4 2026-01-05         24.5
5 2026-01-06         26.0
6 2026-01-07         26.0

Backward fill (bfill):
        date  temperature
0 2026-01-01         22.1
1 2026-01-02         24.5
2 2026-01-03         24.5
3 2026-01-04         24.5
4 2026-01-05         26.0
5 2026-01-06         26.0
6 2026-01-07          NaN

Notice that backward fill leaves the last row as NaN because there is no subsequent value to pull from. You can combine both methods to close all gaps:

df_filled = df.fillna(method='ffill').fillna(method='bfill')
print(df_filled)

Starting with pandas 2.1, you can also use the standalone df.ffill() and df.bfill() methods directly, which are shorthand for fillna(method='ffill') and fillna(method='bfill').

Limiting Propagation with limit

When a sensor drops out for days, forward-filling indefinitely can mask real data gaps. The limit parameter caps how many consecutive NaNs get filled:

import pandas as pd
import numpy as np
 
s = pd.Series([1.0, np.nan, np.nan, np.nan, 5.0])
 
print("limit=1:")
print(s.fillna(method='ffill', limit=1))
 
print("\nlimit=2:")
print(s.fillna(method='ffill', limit=2))

Output:

limit=1:
0    1.0
1    1.0
2    NaN
3    NaN
4    5.0
dtype: float64

limit=2:
0    1.0
1    1.0
2    1.0
3    NaN
4    5.0
dtype: float64

This is critical for time-series data where you want to fill small gaps but flag longer outages for manual review.

fillna() with Mean, Median, and Mode

Statistical imputation replaces missing values with a summary statistic computed from the non-missing values in that column. This is the most common strategy for numerical features before feeding data into a model:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'math_score': [85, np.nan, 92, 78, np.nan, 88],
    'reading_score': [np.nan, 76, 81, np.nan, 90, 85],
    'grade': ['A', 'B', 'A', np.nan, 'B', np.nan]
})
 
# Fill numerical columns with their column mean
df['math_score'] = df['math_score'].fillna(df['math_score'].mean())
df['reading_score'] = df['reading_score'].fillna(df['reading_score'].median())
 
# Fill categorical column with mode (most frequent value)
df['grade'] = df['grade'].fillna(df['grade'].mode()[0])
 
print(df)

Output:

   math_score  reading_score grade
0       85.00          83.00     A
1       85.75          76.00     B
2       92.00          81.00     A
3       78.00          83.00     A
4       85.75          90.00     B
5       88.00          85.00     A

Strategy	Best for	Notes
`mean()`	Numerical data with roughly symmetric distributions	Sensitive to outliers
`median()`	Numerical data with skewed distributions or outliers	More robust than mean
`mode()`	Categorical data or discrete numerical values	Returns the most common value; `mode()[0]` grabs the first if tied

For machine learning pipelines, consider using sklearn.impute.SimpleImputer which integrates with scikit-learn pipelines and handles train/test split imputation correctly. You can also fill missing values per group using .groupby() combined with transform(), or use .apply() for custom per-column fill logic.

interpolate() for Numerical Data

When data follows a trend (stock prices, sensor readings, growth metrics), interpolate() estimates missing values based on surrounding data points rather than using a flat fill:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'day': range(1, 8),
    'revenue': [1000, np.nan, np.nan, 1600, np.nan, 2000, np.nan]
})
 
df['fillna_ffill'] = df['revenue'].fillna(method='ffill')
df['interpolated'] = df['revenue'].interpolate(method='linear')
 
print(df)

Output:

   day  revenue  fillna_ffill  interpolated
0    1   1000.0        1000.0        1000.0
1    2      NaN        1000.0        1200.0
2    3      NaN        1000.0        1400.0
3    4   1600.0        1600.0        1600.0
4    5      NaN        1600.0        1800.0
5    6   2000.0        2000.0        2000.0
6    7      NaN        2000.0        2000.0

Notice how interpolate() produces a smooth linear progression (1000, 1200, 1400, 1600, 1800, 2000) while ffill creates flat plateaus. Pandas supports multiple interpolation methods:

Method	Description
`'linear'`	Default. Draws a straight line between known points.
`'time'`	Linear interpolation weighted by time index.
`'index'`	Uses the actual numerical index values.
`'polynomial'`	Fits a polynomial of specified order.
`'spline'`	Fits a spline of specified order for smooth curves.

Use interpolate() when the data has a natural ordering and trend. Use fillna() when you have a known replacement value or need propagation-based filling.

The inplace Parameter

Like most pandas methods, fillna() returns a new DataFrame by default. Setting inplace=True modifies the original:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({'a': [1, np.nan, 3], 'b': [np.nan, 5, 6]})
 
# Method 1: assignment (recommended)
df_new = df.fillna(0)
print(f"Original unchanged: {df.isna().sum().sum()} NaNs")
print(f"New copy: {df_new.isna().sum().sum()} NaNs")
 
# Method 2: inplace (modifies original)
df.fillna(0, inplace=True)
print(f"After inplace: {df.isna().sum().sum()} NaNs")

Output:

Original unchanged: 2 NaNs
New copy: 0 NaNs
After inplace: 0 NaNs

Modern pandas best practice favors assignment over inplace=True because assignment works naturally in method chains and makes data flow explicit.

Comparison: fillna() vs dropna() vs interpolate()

Choosing the right missing-data strategy depends on your dataset, the missingness pattern, and your downstream use case. Here is a side-by-side comparison:

Aspect	`fillna()`	`dropna()`	`interpolate()`
What it does	Replaces NaN with a specified value	Removes rows or columns containing NaN	Estimates NaN from surrounding values
Row count	Preserved	Reduced	Preserved
Best for	Known replacement values, categorical data, statistical imputation	Small percentage of missing rows, or when imputation would distort analysis	Ordered/time-series numerical data with a natural trend
Risk	Introduces bias if fill value is poorly chosen	Loses data; can bias results if missingness is not random	Assumes a smooth underlying pattern that may not exist
Typical use case	Fill missing survey answers with "No response", fill prices with column mean	Drop rows with no target variable before model training	Fill gaps in daily stock prices or temperature readings
Handles categorical data	Yes	Yes (by dropping)	No (numerical only)
Chain-friendly	Yes	Yes	Yes

Decision rule of thumb:

If less than 5% of rows are missing and the data is missing completely at random, dropna() is safe.
If you have a meaningful default or can compute a reasonable statistic, use fillna().
If the data is ordered and numerical with a trend, use interpolate().

fillna() on Specific Columns

You do not always want to fill the entire DataFrame. Apply fillna() to individual columns or a subset:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'city': ['NYC', None, 'LA', None, 'Chicago'],
    'temperature': [32.1, np.nan, 75.3, np.nan, 28.5],
    'humidity': [45, 60, np.nan, np.nan, 55]
})
 
# Fill only the city column
df['city'] = df['city'].fillna('Unknown')
 
# Fill only the temperature column with its mean
df['temperature'] = df['temperature'].fillna(df['temperature'].mean())
 
# Leave humidity NaNs untouched for now
print(df)

Output:

      city  temperature  humidity
0      NYC    32.100000      45.0
1  Unknown    45.300000      60.0
2       LA    75.300000       NaN
3  Unknown    45.300000       NaN
4  Chicago    28.500000      55.0

This selective approach is important when different columns require different treatment -- or when some missing values are intentional (e.g., humidity might not apply to indoor measurements).

Chaining fillna() with Other Operations

Pandas method chaining lets you build readable data pipelines. fillna() fits naturally into these chains:

import pandas as pd
import numpy as np
 
raw = pd.DataFrame({
    'customer_id': [101, 102, 101, 103, 102, 104],
    'purchase': [25.0, np.nan, 30.0, np.nan, 15.0, np.nan],
    'channel': ['web', 'store', None, 'web', None, 'store']
})
 
result = (
    raw
    .fillna({'purchase': 0, 'channel': 'unknown'})
    .drop_duplicates(subset=['customer_id'], keep='first')
    .sort_values('customer_id')
    .reset_index(drop=True)
)
 
print(result)

Output:

   customer_id  purchase  channel
0          101      25.0      web
1          102       0.0    store
2          103       0.0      web
3          104       0.0    store

This pipeline fills missing values, deduplicates by customer ID, sorts, and resets the index in a single readable expression.

Real-World Pipeline: Cleaning Sales Data

Here is a more realistic chain that combines multiple cleaning steps:

import pandas as pd
import numpy as np
 
sales = pd.DataFrame({
    'date': ['2026-01-01', '2026-01-02', '2026-01-03', '2026-01-04', '2026-01-05'],
    'product': ['Widget', None, 'Widget', 'Gadget', None],
    'units': [10, np.nan, 15, np.nan, 8],
    'unit_price': [9.99, 9.99, np.nan, 14.99, np.nan],
    'region': ['East', 'East', None, 'West', 'West']
})
 
clean = (
    sales
    .assign(date=lambda d: pd.to_datetime(d['date']))
    .fillna({
        'product': 'Unknown',
        'region': 'Unassigned',
        'units': sales['units'].median(),
        'unit_price': sales['unit_price'].median()
    })
    .assign(total=lambda d: d['units'] * d['unit_price'])
    .sort_values('date')
    .reset_index(drop=True)
)
 
print(clean)

Output:

        date  product  units  unit_price      region   total
0 2026-01-01   Widget   10.0        9.99        East   99.90
1 2026-01-02  Unknown   10.0        9.99        East   99.90
2 2026-01-03   Widget   15.0        9.99  Unassigned  149.85
3 2026-01-04   Gadget   10.0       14.99        West  149.90
4 2026-01-05  Unknown    8.0        9.99        West   79.92

The assign() calls create or transform columns, fillna() handles the gaps, and the chain flows top to bottom in logical order.

Visualize Missing Data Patterns with PyGWalker

Before choosing a fill strategy, it helps to see where the missing values are concentrated. Are they scattered randomly, clustered in certain columns, or correlated with specific time periods? Visual inspection often reveals patterns that summary statistics miss.

PyGWalker (opens in a new tab) is an open-source Python library that turns any pandas DataFrame into an interactive, Tableau-like visualization interface directly in Jupyter Notebook. You can drag columns onto axes, switch chart types, and filter data with clicks instead of writing matplotlib boilerplate.

import pandas as pd
import pygwalker as pyg
 
# Load your data and mark missing patterns
df = pd.read_csv('your_data.csv')
 
# Add a column counting missing values per row
df['missing_count'] = df.isna().sum(axis=1)
 
# Launch interactive explorer
walker = pyg.walk(df)

Inside the PyGWalker interface, you can create bar charts showing the count of missing values per column, heatmaps revealing which rows have the most gaps, and scatter plots to check if missingness correlates with other variables. This kind of visual audit often changes which fill strategy you choose.

Install PyGWalker with pip install pygwalker or try it in Google Colab (opens in a new tab).

FAQ

What is the difference between fillna() and dropna()?

fillna() replaces missing values with a value you specify, keeping all rows intact. dropna() removes entire rows (or columns) that contain missing values. Use fillna() when you have a reasonable replacement value and want to preserve your row count. Use dropna() when the missing rows are few and imputation would introduce unacceptable bias.

Can I fill NaN values with the mean of a column?

Yes. Use df['column'] = df['column'].fillna(df['column'].mean()). This computes the mean from the non-missing values and fills every NaN in that column with the result. For skewed data, median() is often a better choice because it is less affected by extreme outliers.

What does the limit parameter do in fillna()?

The limit parameter caps the maximum number of consecutive NaN values that get filled. For example, df.fillna(method='ffill', limit=2) will forward-fill at most 2 consecutive gaps. Any longer sequence of missing values will be only partially filled, leaving the remaining gaps as NaN. This is useful for time-series data where you want to fill short gaps but flag extended outages.

How do I fill NaN with different values for different columns?

Pass a dictionary to fillna() where keys are column names and values are the fill values: df.fillna({'age': 0, 'name': 'Unknown', 'salary': df['salary'].median()}). Each column gets its own fill value, and columns not listed in the dictionary are left unchanged.

Does fillna() change the original DataFrame?

No, by default fillna() returns a new DataFrame and the original remains unchanged. To modify the original, either use assignment (df = df.fillna(0)) or pass inplace=True. The assignment approach is recommended because it works with method chaining and makes the data flow explicit.

Conclusion

Missing values are inevitable in real-world data. The pandas fillna() method gives you precise control over how to handle them:

Use scalar fillna for simple, uniform replacements across the entire DataFrame.
Use dictionary fillna to apply different fill strategies per column -- the most common pattern in production code.
Use forward fill (ffill) and backward fill (bfill) for ordered and time-series data where propagating known values makes sense.
Use mean, median, or mode for statistical imputation of numerical and categorical columns.
Use interpolate() when the data follows a natural trend and you want smooth estimated values rather than flat fills.
Use the limit parameter to prevent propagation-based methods from filling excessively long gaps.
Prefer assignment over inplace=True for cleaner, more readable code.
Always detect and audit missing values with isna() and notna() before choosing a fill strategy.

Once your missing values are handled, tools like PyGWalker (opens in a new tab) let you interactively explore the cleaned data without writing chart code -- helping you verify that your fill logic produced sensible results and move straight into analysis.

Related Guides

Pandas Missing Values: Complete Guide -- broader overview of detecting, analyzing, and handling missing data
Remove Duplicate Rows -- clean duplicates alongside missing values
Pandas GroupBy -- fill missing values per group with groupby + transform
Pandas Apply -- apply custom fill logic across rows or columns
Pandas Data Cleaning Guide -- end-to-end data cleaning workflow

📚