Skip to content

Pandas loc: Select and Filter DataFrame Rows and Columns by Label

Updated on

DataFrame.loc[] is the primary label-based indexer in pandas. It selects rows and columns by their index labels or boolean arrays, returning anything from a single scalar value to a full DataFrame subset.

Most pandas beginners start with bracket notation like df['column'] or df[0:5], then quickly hit walls: they need to select specific rows by name, filter rows while picking certain columns, or update values in-place without triggering warnings. The .loc[] accessor solves all of these problems with a single, consistent interface.

This guide covers every common .loc[] pattern -- from basic row selection to MultiIndex lookups, boolean filtering with compound conditions, and safe value assignment. Each section includes runnable code so you can test directly in your notebook.

📚

Quick Syntax Reference

The general form of .loc[] is:

df.loc[row_selection, column_selection]

Both row_selection and column_selection accept:

  • A single label: 'row_a' or 'col_x'
  • A list of labels: ['row_a', 'row_c']
  • A slice of labels: 'row_a':'row_d' (inclusive on both ends)
  • A boolean array or Series
  • A callable that takes the DataFrame/Series and returns one of the above

If you omit column_selection, .loc[] returns all columns.


loc vs iloc vs at vs iat: Comparison Table

Pandas provides four indexing accessors. Choosing the right one depends on whether you work with labels or integer positions and whether you need a single cell or a range.

AccessorIndexing TypeReturnsAccepts SlicesSpeedBest Use Case
.loc[]Label-basedScalar, Series, or DataFrameYes (inclusive)GoodGeneral label-based selection and filtering
.iloc[]Integer position-basedScalar, Series, or DataFrameYes (exclusive end)GoodPositional selection when labels are unknown
.at[]Label-basedScalar onlyNoFastest for single cellRead/write a single value by label
.iat[]Integer position-basedScalar onlyNoFastest for single cellRead/write a single value by position

Key difference: .loc[] slicing is inclusive on both ends (df.loc['a':'c'] includes row 'c'), while .iloc[] slicing is exclusive on the end (df.iloc[0:3] does not include index 3). This catches many newcomers off guard.


Sample DataFrame

All examples below use this DataFrame unless stated otherwise:

import pandas as pd
 
df = pd.DataFrame({
    'name':    ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'dept':    ['Engineering', 'Marketing', 'Engineering', 'Sales', 'Marketing'],
    'salary':  [95000, 72000, 88000, 67000, 81000],
    'years':   [5, 3, 7, 2, 4],
    'rating':  [4.5, 3.8, 4.2, 3.5, 4.0]
}, index=['e001', 'e002', 'e003', 'e004', 'e005'])
 
print(df)

Output:

       name         dept  salary  years  rating
e001  Alice  Engineering   95000      5     4.5
e002    Bob    Marketing   72000      3     3.8
e003 Charlie Engineering   88000      7     4.2
e004  David        Sales   67000      2     3.5
e005    Eve    Marketing   81000      4     4.0

Select a Single Row by Label

Pass one label to .loc[] to get a row as a Series:

row = df.loc['e003']
print(row)

Output:

name       Charlie
dept       Engineering
salary     88000
years      7
rating     4.2
Name: e003, dtype: object

To get a single-row DataFrame instead of a Series, wrap the label in a list:

row_df = df.loc[['e003']]
print(type(row_df))  # <class 'pandas.core.frame.DataFrame'>

Select Multiple Rows by Label List

Pass a list of labels to select several rows:

subset = df.loc[['e001', 'e004', 'e005']]
print(subset)

Output:

       name       dept  salary  years  rating
e001  Alice  Engineering   95000      5     4.5
e004  David        Sales   67000      2     3.5
e005    Eve    Marketing   81000      4     4.0

The rows appear in the order you specify, not the order of the original index.


Select a Row Range by Label Slice

Label slicing with .loc[] is inclusive on both endpoints. This is different from standard Python slicing and from .iloc[].

# Returns e002, e003, and e004 -- all three
range_rows = df.loc['e002':'e004']
print(range_rows)

Output:

        name         dept  salary  years  rating
e002     Bob    Marketing   72000      3     3.8
e003 Charlie  Engineering   88000      7     4.2
e004   David        Sales   67000      2     3.5

Select a Single Column

Use the second argument to pick a column. The colon : in the first position means "all rows."

names = df.loc[:, 'name']
print(names)

Output:

e001      Alice
e002        Bob
e003    Charlie
e004      David
e005        Eve
Name: name, dtype: object

Select Multiple Columns

Pass a list of column names:

subset_cols = df.loc[:, ['name', 'salary', 'rating']]
print(subset_cols)

Output:

          name  salary  rating
e001     Alice   95000     4.5
e002       Bob   72000     3.8
e003   Charlie   88000     4.2
e004     David   67000     3.5
e005       Eve   81000     4.0

Column slicing also works and is inclusive:

# Columns from 'dept' through 'years'
df.loc[:, 'dept':'years']

Select Rows and Columns Together

Combine row and column selectors in one call:

# Specific rows and specific columns
result = df.loc[['e001', 'e003'], ['name', 'salary']]
print(result)

Output:

          name  salary
e001     Alice   95000
e003   Charlie   88000

Mix slices with lists:

# Row slice + column list
df.loc['e002':'e004', ['name', 'rating']]

Boolean Indexing: Filter Rows by Condition

The most powerful use of .loc[] is filtering rows with boolean conditions while simultaneously selecting columns.

# Employees with salary above 80000
high_earners = df.loc[df['salary'] > 80000]
print(high_earners)

Output:

          name         dept  salary  years  rating
e001     Alice  Engineering   95000      5     4.5
e003   Charlie  Engineering   88000      7     4.2
e005       Eve    Marketing   81000      4     4.0

Filter rows and select specific columns:

# Names and ratings of high earners
df.loc[df['salary'] > 80000, ['name', 'rating']]

Multiple Conditions with & and |

Combine conditions using & (and), | (or), and ~ (not). Each condition must be wrapped in parentheses.

# Engineering employees with 4+ years
senior_eng = df.loc[
    (df['dept'] == 'Engineering') & (df['years'] >= 4),
    ['name', 'years', 'salary']
]
print(senior_eng)

Output:

          name  years  salary
e001     Alice      5   95000
e003   Charlie      7   88000

Using | for "or" conditions:

# Marketing OR Sales department
mkt_sales = df.loc[
    (df['dept'] == 'Marketing') | (df['dept'] == 'Sales')
]
print(mkt_sales)

Using ~ for negation:

# Everyone NOT in Engineering
non_eng = df.loc[~(df['dept'] == 'Engineering')]
print(non_eng)

For membership tests, .isin() is cleaner than chaining |:

df.loc[df['dept'].isin(['Marketing', 'Sales'])]

Setting Values with loc

.loc[] is the recommended way to update DataFrame values. It modifies the DataFrame in-place without triggering SettingWithCopyWarning.

Set a single cell

df.loc['e004', 'salary'] = 70000
print(df.loc['e004', 'salary'])  # 70000

Set an entire column for matching rows

# Give a 10% raise to everyone in Engineering
df.loc[df['dept'] == 'Engineering', 'salary'] = (
    df.loc[df['dept'] == 'Engineering', 'salary'] * 1.10
).astype(int)
print(df.loc[df['dept'] == 'Engineering', ['name', 'salary']])

Create a new column conditionally

df.loc[df['rating'] >= 4.0, 'performance'] = 'high'
df.loc[df['rating'] < 4.0, 'performance'] = 'standard'
print(df[['name', 'rating', 'performance']])

Output:

          name  rating performance
e001     Alice     4.5       high
e002       Bob     3.8   standard
e003   Charlie     4.2       high
e004     David     3.5   standard
e005       Eve     4.0       high

Using loc with MultiIndex

.loc[] handles hierarchical indexes naturally with tuples.

# Create a MultiIndex DataFrame
arrays = [
    ['Q1', 'Q1', 'Q2', 'Q2', 'Q3', 'Q3'],
    ['North', 'South', 'North', 'South', 'North', 'South']
]
index = pd.MultiIndex.from_arrays(arrays, names=['quarter', 'region'])
 
sales = pd.DataFrame({
    'revenue': [100, 150, 120, 180, 90, 200],
    'units': [10, 15, 12, 18, 9, 20]
}, index=index)
 
print(sales)

Select by first level:

# All Q1 data
sales.loc['Q1']

Select by both levels using a tuple:

# Q2, North specifically
sales.loc[('Q2', 'North')]

Slice across levels:

# Q1 through Q2, all regions
sales.loc['Q1':'Q2']

Use pd.IndexSlice for advanced cross-section slicing:

idx = pd.IndexSlice
# All quarters, only South region
sales.loc[idx[:, 'South'], :]

loc with Callable (Lambda Filtering)

Pass a callable (typically a lambda) to .loc[]. The callable receives the DataFrame and must return a valid indexer.

# Filter using a lambda -- employees whose name starts with a vowel
vowel_names = df.loc[lambda x: x['name'].str[0].isin(['A', 'E', 'I', 'O', 'U'])]
print(vowel_names[['name', 'dept']])

Output:

       name         dept
e001  Alice  Engineering
e005    Eve    Marketing

Callables are especially useful in method chains where you cannot reference df directly:

result = (
    df
    .assign(bonus=lambda x: x['salary'] * 0.1)
    .loc[lambda x: x['bonus'] > 8000, ['name', 'salary', 'bonus']]
)
print(result)

Common Gotchas

1. Inclusive Slicing

New users familiar with Python lists or .iloc[] expect the end of a slice to be excluded. With .loc[], both endpoints are included.

# Python list: [0, 1, 2, 3, 4][1:3] gives [1, 2]  -- excludes index 3
# pandas loc: df.loc['e001':'e003'] gives e001, e002, e003 -- includes e003

2. SettingWithCopyWarning

This warning fires when you chain indexing operations. Avoid it by using .loc[] directly.

# WRONG -- chained indexing, may not update the original DataFrame
# df[df['dept'] == 'Sales']['salary'] = 75000
 
# CORRECT -- single .loc[] call
df.loc[df['dept'] == 'Sales', 'salary'] = 75000

Starting from pandas 2.0+ with Copy-on-Write enabled (the default in pandas 3.0), chained assignment silently does nothing. Use .loc[] to be safe across all versions.

3. KeyError from Missing Labels

.loc[] raises KeyError if a label does not exist in the index.

# This raises KeyError if 'e999' is not in the index
# df.loc['e999']
 
# Safe alternative: use .reindex() or check first
if 'e999' in df.index:
    row = df.loc['e999']

4. Boolean Series Must Be Aligned

The boolean array passed to .loc[] must have the same index as the DataFrame. A common mistake is filtering with a boolean array from a different DataFrame.

# Correct: boolean Series from the same DataFrame
mask = df['salary'] > 80000  # Index matches df
df.loc[mask]
 
# Incorrect: boolean array from a different-length source
# external_mask = pd.Series([True, False, True])  # length 3, df has 5 rows
# df.loc[external_mask]  # Raises IndexingError

Performance Tips

.loc[] is efficient for most workloads, but certain patterns are faster than others.

OperationTip
Single cell read/writeUse .at[] instead -- 5-10x faster than .loc[] for scalar access
Bulk boolean filter.loc[mask] is fast. Precompute mask as a variable if reusing it
Large DataFrame + complex conditionsCombine conditions into a single mask before passing to .loc[]
Iterating row by rowAvoid .loc[] in a loop. Use vectorized operations or .apply()
String column filtersUse .str accessor methods to build the mask, then pass to .loc[]
MultiIndex lookups.xs() can be faster for cross-section selection on one level

General rule: if you catch yourself writing a for-loop with .loc[] inside, there is almost always a vectorized alternative.


Real-World Example: Filtering and Updating a Sales DataFrame

Here is a realistic workflow that combines multiple .loc[] patterns.

import pandas as pd
import numpy as np
 
# Simulated sales data
np.random.seed(42)
n = 1000
sales = pd.DataFrame({
    'order_id': range(1, n + 1),
    'region': np.random.choice(['East', 'West', 'North', 'South'], n),
    'product': np.random.choice(['Widget', 'Gadget', 'Doohickey'], n),
    'quantity': np.random.randint(1, 50, n),
    'unit_price': np.random.uniform(5, 100, n).round(2),
    'status': np.random.choice(['completed', 'pending', 'cancelled'], n, p=[0.7, 0.2, 0.1])
})
sales['total'] = sales['quantity'] * sales['unit_price']
sales = sales.set_index('order_id')
 
# 1. View completed orders in the East region
east_completed = sales.loc[
    (sales['region'] == 'East') & (sales['status'] == 'completed'),
    ['product', 'quantity', 'total']
]
print(f"East completed orders: {len(east_completed)}")
 
# 2. Flag high-value orders (total > 2000)
sales.loc[sales['total'] > 2000, 'tier'] = 'premium'
sales.loc[sales['total'] <= 2000, 'tier'] = 'standard'
 
# 3. Apply a 5% discount to pending Widget orders
mask = (sales['status'] == 'pending') & (sales['product'] == 'Widget')
sales.loc[mask, 'total'] = (sales.loc[mask, 'total'] * 0.95).round(2)
 
# 4. Summary of premium orders by region
premium = sales.loc[sales['tier'] == 'premium']
print(premium.groupby('region')['total'].agg(['count', 'mean', 'sum']).round(2))

This pattern -- filter, update, re-filter -- is the bread and butter of pandas data wrangling, and .loc[] handles every step.


Visualize Your Selections with PyGWalker

After selecting and filtering data with .loc[], visual exploration helps you validate the results and find patterns faster. PyGWalker (opens in a new tab) turns any pandas DataFrame into an interactive Tableau-like interface inside Jupyter notebooks. Instead of writing separate plotting code for each question, you drag and drop columns to build charts instantly.

import pygwalker as pyg
 
# Select a subset with loc and visualize it
filtered = sales.loc[
    (sales['region'].isin(['East', 'West'])) & (sales['status'] == 'completed')
]
pyg.walk(filtered)

PyGWalker is particularly useful after .loc[] operations because you can visually cross-check that your filters returned the expected data, then explore distributions and outliers without writing additional code.


FAQ

What does pandas loc do?

pandas .loc[] is a label-based indexer for selecting rows and columns from a DataFrame using index labels, column names, boolean arrays, or callables. It returns a subset of the DataFrame matching the specified labels. Unlike .iloc[], which uses integer positions, .loc[] uses the actual names in the index and column headers.

What is the difference between loc and iloc in pandas?

.loc[] uses label-based indexing and is inclusive on both ends of a slice. .iloc[] uses integer position-based indexing and is exclusive on the end of a slice (like standard Python). For example, df.loc['a':'c'] includes rows labeled 'a', 'b', and 'c', while df.iloc[0:3] includes positions 0, 1, and 2 only. Use .loc[] when you know the label names; use .iloc[] when you know the numeric positions.

How do I filter rows with multiple conditions using loc?

Combine boolean conditions with & (and), | (or), and ~ (not), wrapping each condition in parentheses. For example: df.loc[(df['age'] > 30) & (df['city'] == 'NYC'), ['name', 'salary']]. Do not use Python's and/or keywords, as they cannot operate on pandas Series element-wise and will raise a ValueError.

How do I set values with loc without getting SettingWithCopyWarning?

Always use .loc[] directly on the original DataFrame in a single indexing operation. Write df.loc[df['col'] > 5, 'col2'] = new_value instead of df[df['col'] > 5]['col2'] = new_value. The chained form creates an intermediate copy, so the assignment may not propagate to the original DataFrame.

Can I use loc with a MultiIndex DataFrame?

Yes. Pass a tuple of labels to access specific levels: df.loc[('level1_val', 'level2_val')]. For partial indexing on the first level, pass just the first-level label: df.loc['level1_val']. For advanced cross-section slicing across levels, use pd.IndexSlice: df.loc[pd.IndexSlice[:, 'level2_val'], :].

Conclusion

pandas .loc[] is the go-to accessor for label-based DataFrame selection. It handles single-cell reads, multi-row slicing, boolean filtering with compound conditions, in-place value assignment, and MultiIndex lookups -- all with a consistent df.loc[rows, columns] syntax.

Key points to remember:

  • Slicing is inclusive on both ends, unlike .iloc[] and standard Python
  • Use .loc[] for assignment to avoid SettingWithCopyWarning
  • Wrap compound conditions in parentheses and use &, |, ~ instead of and, or, not
  • Use .at[] for single-cell access when performance matters
  • Callables enable clean filtering inside method chains

For interactive exploration of your .loc[] results, PyGWalker (opens in a new tab) lets you drag-and-drop columns to build visualizations directly from any filtered DataFrame -- no additional plotting code required.

📚