Skip to content

NumPy Where: Conditional Array Operations Made Simple

Updated on

You need to replace every negative number in a 10-million-element array with zero. Or flag every temperature reading above 100 as an outlier. Or map exam scores into letter grades. Writing a Python for-loop to handle these operations works -- until it takes 30 seconds to run on a large dataset. numpy.where() solves this problem by applying conditional logic across entire arrays at C-compiled speed, collapsing what would be a multi-line loop into a single, readable expression.

📚

What This Guide Covers

This guide walks through every practical use of numpy.where(): the three-argument form for conditional replacement, the single-argument form for finding indices, combining multiple conditions, working with multidimensional arrays, nesting calls for multi-category logic, and comparing np.where against alternatives like np.select and boolean indexing. Each section includes runnable code and explanations of when to use which approach.

How np.where() Works: The Two Forms

numpy.where() has two distinct calling signatures that serve different purposes.

Form 1: Three Arguments (condition, x, y)

numpy.where(condition, x, y)

This returns an array where each element comes from x if the corresponding condition is True, or from y if the condition is False. Think of it as a vectorized ternary operator applied element-by-element.

import numpy as np
 
scores = np.array([45, 78, 92, 33, 67, 88, 51])
 
# "Pass" if score >= 60, otherwise "Fail"
result = np.where(scores >= 60, "Pass", "Fail")
print(result)
# ['Fail' 'Pass' 'Pass' 'Fail' 'Pass' 'Pass' 'Fail']

The condition, x, and y can each be arrays or scalars. When they are arrays, they must be broadcastable to the same shape.

import numpy as np
 
a = np.array([10, 20, 30, 40, 50])
b = np.array([55, 15, 35, 45, 5])
 
# Pick the larger value from each pair
result = np.where(a > b, a, b)
print(result)
# [55 20 35 45 50]

Form 2: One Argument (condition only)

numpy.where(condition)

When called with only a condition, np.where() returns a tuple of arrays -- one per dimension -- containing the indices where the condition is True. This is equivalent to np.nonzero(condition).

import numpy as np
 
temps = np.array([98.1, 100.4, 97.5, 103.2, 99.0, 101.8])
 
# Find indices where temperature exceeds 100
hot_indices = np.where(temps > 100)
print(hot_indices)
# (array([1, 3, 5]),)
 
# Use the indices to get the actual values
print(temps[hot_indices])
# [100.4 103.2 101.8]

The result is always a tuple, even for 1D arrays. For a 1D array, hot_indices[0] gives the flat indices. For 2D arrays, you get row indices and column indices as separate arrays.

The Three-Argument Form in Depth

Replacing Values Based on a Condition

The most common use case is replacing values that meet (or fail) a condition.

import numpy as np
 
data = np.array([-5, 3, -1, 7, -8, 2, 0, -3])
 
# Replace negatives with zero
cleaned = np.where(data < 0, 0, data)
print(cleaned)
# [0 3 0 7 0 2 0 0]

Notice that the original array is unchanged. np.where() returns a new array.

Using Arrays for Both x and y

When both x and y are arrays, np.where picks elements from one or the other based on the condition.

import numpy as np
 
base_prices = np.array([100, 200, 150, 300, 250])
sale_prices = np.array([80, 180, 120, 250, 200])
is_on_sale = np.array([True, False, True, True, False])
 
final_prices = np.where(is_on_sale, sale_prices, base_prices)
print(final_prices)
# [ 80 200 120 250 250]

Type Behavior

The output dtype is determined by NumPy's type promotion rules applied to x and y. Mixing integers and floats produces a float array. Mixing numbers and strings produces an object array.

import numpy as np
 
arr = np.array([1, 2, 3, 4, 5])
 
# int condition with float replacement -> float output
result = np.where(arr > 3, arr, 0.0)
print(result.dtype)
# float64
print(result)
# [0. 0. 0. 4. 5.]

Finding Indices with the Single-Argument Form

1D Arrays

import numpy as np
 
data = np.array([12, 5, 8, 19, 3, 15, 7])
 
# Get indices of values greater than 10
indices = np.where(data > 10)[0]
print(indices)
# [0 3 5]
 
print(data[indices])
# [12 19 15]

2D Arrays

For 2D arrays, np.where() returns two arrays: row indices and column indices.

import numpy as np
 
matrix = np.array([
    [1, 0, 3],
    [0, 5, 0],
    [7, 0, 9]
])
 
# Find all non-zero elements
rows, cols = np.where(matrix != 0)
print("Row indices:", rows)
print("Col indices:", cols)
# Row indices: [0 0 1 2 2]
# Col indices: [0 2 1 0 2]
 
# Pair them up
for r, c in zip(rows, cols):
    print(f"  matrix[{r},{c}] = {matrix[r, c]}")
# matrix[0,0] = 1
# matrix[0,2] = 3
# matrix[1,1] = 5
# matrix[2,0] = 7
# matrix[2,2] = 9

Combining Multiple Conditions

Real filtering often requires more than one condition. Use & (and), | (or), and ~ (not) to combine conditions. Each individual condition must be wrapped in parentheses.

Using & (AND)

import numpy as np
 
ages = np.array([15, 22, 35, 12, 45, 67, 29, 8])
 
# Find people aged 18-65
working_age = np.where((ages >= 18) & (ages <= 65), "Working age", "Other")
print(working_age)
# ['Other' 'Working age' 'Working age' 'Other' 'Working age' 'Other'
#  'Working age' 'Other']

Using | (OR)

import numpy as np
 
values = np.array([3, -7, 15, 0, -2, 22, 8, -11])
 
# Flag values that are negative OR above 20
flagged = np.where((values < 0) | (values > 20), True, False)
print(flagged)
# [False  True False False  True  True False  True]

Using ~ (NOT)

import numpy as np
 
data = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
 
# Replace NaN with 0
cleaned = np.where(~np.isnan(data), data, 0)
print(cleaned)
# [1. 0. 3. 0. 5.]

Common Mistake: Forgetting Parentheses

This is one of the most frequent bugs when using np.where with multiple conditions. Python's operator precedence means & binds tighter than comparison operators.

import numpy as np
 
arr = np.array([5, 10, 15, 20, 25])
 
# WRONG - raises an error or gives wrong results
# np.where(arr > 5 & arr < 20, "yes", "no")
# This is parsed as: arr > (5 & arr) < 20
 
# CORRECT - parentheses around each condition
result = np.where((arr > 5) & (arr < 20), "yes", "no")
print(result)
# ['no' 'yes' 'yes' 'no' 'no']

Always wrap each condition in its own parentheses when combining with &, |, or ~.

np.where with Multidimensional Arrays

np.where() works seamlessly on arrays of any shape. The condition, x, and y are evaluated element-by-element across all dimensions.

import numpy as np
 
matrix = np.array([
    [10, 25, 30],
    [45, 50, 15],
    [60, 5,  20]
])
 
# Replace values below 20 with -1
result = np.where(matrix >= 20, matrix, -1)
print(result)
# [[-1 25 30]
#  [45 50 -1]
#  [60 -1 20]]

Broadcasting with np.where

You can use broadcasting to apply conditions across different shapes.

import numpy as np
 
# 3x4 matrix of student scores
scores = np.array([
    [85, 62, 91, 78],
    [55, 73, 88, 42],
    [96, 81, 67, 59]
])
 
# Passing threshold per subject (1D, broadcasts across rows)
thresholds = np.array([60, 70, 75, 50])
 
# Check if each student passes each subject
passed = np.where(scores >= thresholds, "P", "F")
print(passed)
# [['P' 'F' 'P' 'P']
#  ['F' 'P' 'P' 'F']
#  ['P' 'P' 'F' 'P']]

Nested np.where for Multiple Categories

When you need more than two outcomes, you can nest np.where() calls. This is the NumPy equivalent of if-elif-else chains.

import numpy as np
 
scores = np.array([95, 82, 74, 65, 58, 43, 89, 71])
 
grades = np.where(scores >= 90, "A",
         np.where(scores >= 80, "B",
         np.where(scores >= 70, "C",
         np.where(scores >= 60, "D", "F"))))
 
print(grades)
# ['A' 'B' 'C' 'D' 'F' 'F' 'B' 'C']

This works, but it gets hard to read beyond two or three levels. For more categories, np.select is a better choice.

np.where vs np.select: When You Have Many Conditions

np.select() accepts a list of conditions and a list of corresponding values. It is cleaner and more maintainable than deeply nested np.where().

import numpy as np
 
scores = np.array([95, 82, 74, 65, 58, 43, 89, 71])
 
conditions = [
    scores >= 90,
    scores >= 80,
    scores >= 70,
    scores >= 60,
]
choices = ["A", "B", "C", "D"]
 
grades = np.select(conditions, choices, default="F")
print(grades)
# ['A' 'B' 'C' 'D' 'F' 'F' 'B' 'C']

np.select evaluates conditions in order and picks the first matching one, just like an if-elif chain.

Comparison: np.where vs Alternatives

MethodBest ForSpeedReadabilityHandles N Conditions
np.where(cond, x, y)Binary condition (if/else)Fast (vectorized)High2 (nest for more)
np.select(conds, vals)Multiple categories (if/elif/else)Fast (vectorized)High for 3+ conditionsUnlimited
Boolean indexing arr[mask]Extracting/modifying a subsetFast (vectorized)High for simple cases1 at a time
List comprehensionComplex per-element logicSlow (Python loop)MediumUnlimited
np.vectorize(func)Arbitrary Python functionSlow (not truly vectorized)MediumUnlimited

Boolean Indexing vs np.where

Boolean indexing and np.where overlap in many cases but behave differently.

import numpy as np
 
data = np.array([10, 25, 5, 30, 15, 40])
 
# Boolean indexing: returns only the matching elements (shorter array)
filtered = data[data > 20]
print(filtered)
# [25 30 40]
 
# np.where: returns a same-shape array with replacements
replaced = np.where(data > 20, data, 0)
print(replaced)
# [ 0 25  0 30  0 40]

Use boolean indexing when you want to extract a subset. Use np.where when you want to keep the original shape and replace non-matching elements.

Performance: np.where vs Python Loops

np.where operates at compiled C speed. The difference is dramatic on large arrays.

import numpy as np
import time
 
size = 10_000_000
data = np.random.randn(size)
 
# Method 1: np.where
start = time.time()
result_np = np.where(data > 0, data, 0)
np_time = time.time() - start
 
# Method 2: Python list comprehension
start = time.time()
result_list = np.array([x if x > 0 else 0 for x in data])
list_time = time.time() - start
 
print(f"np.where:          {np_time:.4f}s")
print(f"List comprehension: {list_time:.4f}s")
print(f"Speedup:            {list_time / np_time:.0f}x")

Typical output on a modern machine:

MethodTime (10M elements)Relative Speed
np.where~0.03s1x (baseline)
List comprehension~3.5s~100x slower
Python for-loop~5.0s~150x slower

The performance advantage grows with array size. For arrays with fewer than a few hundred elements, the overhead of NumPy's array creation can make Python loops competitive. For anything larger, np.where wins decisively.

Real-World Examples

Data Cleaning: Handling Missing and Invalid Values

import numpy as np
 
# Sensor readings with -999 as a "missing" marker
readings = np.array([23.5, -999, 24.1, 25.0, -999, 22.8, -999, 24.5])
 
# Replace -999 with NaN for proper statistical treatment
cleaned = np.where(readings == -999, np.nan, readings)
print(cleaned)
# [23.5  nan 24.1 25.   nan 22.8  nan 24.5]
 
# Now compute the mean, ignoring NaN
print(np.nanmean(cleaned))
# 23.98

Feature Engineering: Creating Categorical Features

import numpy as np
 
# Customer purchase amounts
purchases = np.array([5, 150, 45, 500, 20, 1200, 75, 3000])
 
# Create spending tiers
tiers = np.select(
    [purchases >= 1000, purchases >= 100, purchases >= 0],
    ["Premium", "Standard", "Basic"],
    default="Unknown"
)
print(tiers)
# ['Basic' 'Standard' 'Basic' 'Premium' 'Basic' 'Premium' 'Basic' 'Premium']

Replacing Outliers with Bounds (Winsorizing)

import numpy as np
 
np.random.seed(42)
data = np.random.normal(100, 15, size=1000)
 
# Add some extreme outliers
data[0] = 250
data[1] = 10
 
# Clip to [mean - 2*std, mean + 2*std] using np.where
mean = np.mean(data)
std = np.std(data)
lower = mean - 2 * std
upper = mean + 2 * std
 
clipped = np.where(data < lower, lower, np.where(data > upper, upper, data))
 
print(f"Before: min={data.min():.1f}, max={data.max():.1f}")
print(f"After:  min={clipped.min():.1f}, max={clipped.max():.1f}")
# Before: min=10.0, max=250.0
# After:  min=69.4, max=131.4

Conditional Calculations with Different Formulas

import numpy as np
 
# Tax calculation with progressive brackets
income = np.array([25000, 50000, 85000, 120000, 200000])
 
# Simplified tax brackets:
# 0-30k: 10%, 30k-80k: 20%, 80k+: 30%
tax = np.select(
    [income <= 30000, income <= 80000, income > 80000],
    [
        income * 0.10,
        3000 + (income - 30000) * 0.20,
        3000 + 10000 + (income - 80000) * 0.30,
    ]
)
 
for inc, t in zip(income, tax):
    print(f"Income: ${inc:>7,} -> Tax: ${t:>8,.0f}")
# Income: $ 25,000 -> Tax: $   2,500
# Income: $ 50,000 -> Tax: $   7,000
# Income: $ 85,000 -> Tax: $  14,500
# Income: $120,000 -> Tax: $  25,000
# Income: $200,000 -> Tax: $  49,000

Working with DataFrames (Pandas + NumPy)

np.where works directly on pandas Series and is a standard tool for creating derived columns.

import numpy as np
import pandas as pd
 
df = pd.DataFrame({
    "product": ["Widget A", "Widget B", "Widget C", "Widget D"],
    "revenue": [1200, 800, 3500, 150],
    "cost": [400, 900, 1500, 200],
})
 
# Add a profit/loss column
df["profit"] = df["revenue"] - df["cost"]
df["status"] = np.where(df["profit"] > 0, "Profit", "Loss")
 
print(df)
#     product  revenue  cost  profit status
# 0  Widget A     1200   400     800  Profit
# 1  Widget B      800   900    -100    Loss
# 2  Widget C     3500  1500    2000  Profit
# 3  Widget D      150   200     -50    Loss

If you are working with pandas DataFrames and want to go beyond tabular inspection, PyGWalker (opens in a new tab) lets you turn any DataFrame into an interactive drag-and-drop visualization interface -- similar to Tableau but inside your Jupyter notebook. After you have used np.where or np.select to engineer new columns, you can explore the results visually with a single function call:

import pygwalker as pyg
 
# After data cleaning and feature engineering with np.where
walker = pyg.walk(df)

np.where with String Arrays

np.where handles string arrays and mixed-type results too.

import numpy as np
 
names = np.array(["Alice", "Bob", "Charlie", "Diana"])
scores = np.array([92, 58, 74, 85])
 
# Create a result message per student
messages = np.where(
    scores >= 60,
    np.char.add(names, " passed"),
    np.char.add(names, " failed")
)
print(messages)
# ['Alice passed' 'Bob failed' 'Charlie passed' 'Diana passed']

Quick Reference: np.where Syntax Cheat Sheet

import numpy as np
 
arr = np.array([1, 5, 3, 8, 2, 7])
 
# 1. Replace values: np.where(condition, if_true, if_false)
np.where(arr > 4, arr, 0)           # [0 5 0 8 0 7]
 
# 2. Find indices: np.where(condition)
np.where(arr > 4)                    # (array([1, 3, 5]),)
 
# 3. Multiple AND conditions
np.where((arr > 2) & (arr < 7), arr, -1)  # [-1 5 3 -1 -1 -1]
 
# 4. Multiple OR conditions
np.where((arr < 2) | (arr > 6), arr, 0)   # [1 0 0 8 0 7]
 
# 5. Nested np.where
np.where(arr > 6, "high",
    np.where(arr > 3, "mid", "low"))       # ['low' 'mid' 'low' 'high' 'low' 'high']
 
# 6. Replace NaN
data = np.array([1.0, np.nan, 3.0])
np.where(np.isnan(data), 0, data)          # [1. 0. 3.]

FAQ

What does numpy.where() return?

With three arguments (condition, x, y), it returns a new array of the same shape as the condition, where each element is picked from x if the condition is True or from y if False. With one argument (condition), it returns a tuple of index arrays showing where the condition is True.

How do I use np.where with multiple conditions?

Combine conditions using & (and), | (or), and ~ (not). Each condition must be wrapped in parentheses: np.where((arr > 5) & (arr < 20), 'yes', 'no'). Forgetting the parentheses is one of the most common mistakes.

What is the difference between np.where and np.select?

np.where handles a single condition with two outcomes (if/else). np.select accepts a list of conditions and corresponding values, making it better for three or more categories. Use np.where for binary decisions and np.select for multi-category classification.

Does np.where modify the original array?

No. np.where() always returns a new array. The original array remains unchanged. If you want to modify the original in-place, use boolean indexing: arr[arr < 0] = 0.

Is np.where faster than a for loop?

Yes, significantly. np.where is a vectorized operation executed in compiled C code. On a 10-million-element array, it typically runs 50-150x faster than a Python for-loop or list comprehension.

Can np.where handle NaN values?

Yes, but you need to use np.isnan() as the condition because NaN != NaN in IEEE floating-point arithmetic. Example: np.where(np.isnan(data), 0, data) replaces all NaN values with zero.

Conclusion

numpy.where() is one of the most versatile functions in the NumPy toolkit. It replaces verbose loops with concise, readable expressions that run orders of magnitude faster. The key takeaways:

  • Two forms: Three arguments (condition, x, y) for conditional replacement. One argument (condition) for finding indices.
  • Multiple conditions: Combine with &, |, ~ and always use parentheses around each condition.
  • Nesting limit: For more than two or three categories, switch to np.select for cleaner code.
  • Performance: 50-150x faster than Python loops on large arrays.
  • Works everywhere: Scalars, 1D arrays, multidimensional arrays, and pandas Series.

Whether you are cleaning sensor data, engineering features for machine learning, or building conditional logic into numerical pipelines, np.where is the function you will reach for first.

📚