Skip to content

Pandas loc: Select and Filter DataFrame Rows and Columns by Label

Updated on

DataFrame.loc[] is the primary label-based indexer in pandas. It selects rows and columns by their index labels or boolean arrays, returning anything from a single scalar value to a full DataFrame subset.

Most pandas beginners start with bracket notation like df['column'] or df[0:5], then quickly hit walls: they need to select specific rows by name, filter rows while picking certain columns, or update values in-place without triggering warnings. The .loc[] accessor solves all of these problems with a single, consistent interface.

This guide covers every common .loc[] pattern -- from basic row selection to MultiIndex lookups, boolean filtering with compound conditions, and safe value assignment. Each section includes runnable code so you can test directly in your notebook.


Quick Syntax Reference

The general form of .loc[] is:

df.loc[row_selection, column_selection]

Both row_selection and column_selection accept:

  • A single label: 'row_a' or 'col_x'
  • A list of labels: ['row_a', 'row_c']
  • A slice of labels: 'row_a':'row_d' (inclusive on both ends)
  • A boolean array or Series
  • A callable that takes the DataFrame/Series and returns one of the above

If you omit column_selection, .loc[] returns all columns.


loc vs iloc vs at vs iat: Comparison Table

Pandas provides four indexing accessors. Choosing the right one depends on whether you work with labels or integer positions and whether you need a single cell or a range.

AccessorIndexing TypeReturnsAccepts SlicesSpeedBest Use Case
.loc[]Label-basedScalar, Series, or DataFrameYes (inclusive)GoodGeneral label-based selection and filtering
.iloc[]Integer position-basedScalar, Series, or DataFrameYes (exclusive end)GoodPositional selection when labels are unknown
.at[]Label-basedScalar onlyNoFastest for single cellRead/write a single value by label
.iat[]Integer position-basedScalar onlyNoFastest for single cellRead/write a single value by position

Key difference: .loc[] slicing is inclusive on both ends (df.loc['a':'c'] includes row 'c'), while .iloc[] slicing is exclusive on the end (df.iloc[0:3] does not include index 3). This catches many newcomers off guard.


Sample DataFrame

All examples below use this DataFrame unless stated otherwise:

import pandas as pd
 
df = pd.DataFrame({
    'name':    ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'dept':    ['Engineering', 'Marketing', 'Engineering', 'Sales', 'Marketing'],
    'salary':  [95000, 72000, 88000, 67000, 81000],
    'years':   [5, 3, 7, 2, 4],
    'rating':  [4.5, 3.8, 4.2, 3.5, 4.0]
}, index=['e001', 'e002', 'e003', 'e004', 'e005'])
 
print(df)

Output:

       name         dept  salary  years  rating
e001  Alice  Engineering   95000      5     4.5
e002    Bob    Marketing   72000      3     3.8
e003 Charlie Engineering   88000      7     4.2
e004  David        Sales   67000      2     3.5
e005    Eve    Marketing   81000      4     4.0

Select a Single Row by Label

Pass one label to .loc[] to get a row as a Series:

row = df.loc['e003']
print(row)

Output:

name       Charlie
dept       Engineering
salary     88000
years      7
rating     4.2
Name: e003, dtype: object

To get a single-row DataFrame instead of a Series, wrap the label in a list:

row_df = df.loc[['e003']]
print(type(row_df))  # <class 'pandas.core.frame.DataFrame'>

Select Multiple Rows by Label List

Pass a list of labels to select several rows:

subset = df.loc[['e001', 'e004', 'e005']]
print(subset)

Output:

       name       dept  salary  years  rating
e001  Alice  Engineering   95000      5     4.5
e004  David        Sales   67000      2     3.5
e005    Eve    Marketing   81000      4     4.0

The rows appear in the order you specify, not the order of the original index.


Select a Row Range by Label Slice

Label slicing with .loc[] is inclusive on both endpoints. This is different from standard Python slicing and from .iloc[].

# Returns e002, e003, and e004 -- all three
range_rows = df.loc['e002':'e004']
print(range_rows)

Output:

        name         dept  salary  years  rating
e002     Bob    Marketing   72000      3     3.8
e003 Charlie  Engineering   88000      7     4.2
e004   David        Sales   67000      2     3.5

Select a Single Column

Use the second argument to pick a column. The colon : in the first position means "all rows."

names = df.loc[:, 'name']
print(names)

Output:

e001      Alice
e002        Bob
e003    Charlie
e004      David
e005        Eve
Name: name, dtype: object

Select Multiple Columns

Pass a list of column names:

subset_cols = df.loc[:, ['name', 'salary', 'rating']]
print(subset_cols)

Output:

          name  salary  rating
e001     Alice   95000     4.5
e002       Bob   72000     3.8
e003   Charlie   88000     4.2
e004     David   67000     3.5
e005       Eve   81000     4.0

Column slicing also works and is inclusive:

# Columns from 'dept' through 'years'
df.loc[:, 'dept':'years']

Select Rows and Columns Together

Combine row and column selectors in one call:

# Specific rows and specific columns
result = df.loc[['e001', 'e003'], ['name', 'salary']]
print(result)

Output:

          name  salary
e001     Alice   95000
e003   Charlie   88000

Mix slices with lists:

# Row slice + column list
df.loc['e002':'e004', ['name', 'rating']]

Boolean Indexing: Filter Rows by Condition

The most powerful use of .loc[] is filtering rows with boolean conditions while simultaneously selecting columns.

# Employees with salary above 80000
high_earners = df.loc[df['salary'] > 80000]
print(high_earners)

Output:

          name         dept  salary  years  rating
e001     Alice  Engineering   95000      5     4.5
e003   Charlie  Engineering   88000      7     4.2
e005       Eve    Marketing   81000      4     4.0

Filter rows and select specific columns:

# Names and ratings of high earners
df.loc[df['salary'] > 80000, ['name', 'rating']]

Multiple Conditions with & and |

Combine conditions using & (and), | (or), and ~ (not). Each condition must be wrapped in parentheses.

# Engineering employees with 4+ years
senior_eng = df.loc[
    (df['dept'] == 'Engineering') & (df['years'] >= 4),
    ['name', 'years', 'salary']
]
print(senior_eng)

Output:

          name  years  salary
e001     Alice      5   95000
e003   Charlie      7   88000

Using | for "or" conditions:

# Marketing OR Sales department
mkt_sales = df.loc[
    (df['dept'] == 'Marketing') | (df['dept'] == 'Sales')
]
print(mkt_sales)

Using ~ for negation:

# Everyone NOT in Engineering
non_eng = df.loc[~(df['dept'] == 'Engineering')]
print(non_eng)

For membership tests, .isin() is cleaner than chaining |:

df.loc[df['dept'].isin(['Marketing', 'Sales'])]

Setting Values with loc

.loc[] is the recommended way to update DataFrame values. It modifies the DataFrame in-place without triggering SettingWithCopyWarning.

Set a single cell

df.loc['e004', 'salary'] = 70000
print(df.loc['e004', 'salary'])  # 70000

Set an entire column for matching rows

# Give a 10% raise to everyone in Engineering
df.loc[df['dept'] == 'Engineering', 'salary'] = (
    df.loc[df['dept'] == 'Engineering', 'salary'] * 1.10
).astype(int)
print(df.loc[df['dept'] == 'Engineering', ['name', 'salary']])

Create a new column conditionally

df.loc[df['rating'] >= 4.0, 'performance'] = 'high'
df.loc[df['rating'] < 4.0, 'performance'] = 'standard'
print(df[['name', 'rating', 'performance']])

Output:

          name  rating performance
e001     Alice     4.5       high
e002       Bob     3.8   standard
e003   Charlie     4.2       high
e004     David     3.5   standard
e005       Eve     4.0       high

Using loc with MultiIndex

.loc[] handles hierarchical indexes naturally with tuples.

# Create a MultiIndex DataFrame
arrays = [
    ['Q1', 'Q1', 'Q2', 'Q2', 'Q3', 'Q3'],
    ['North', 'South', 'North', 'South', 'North', 'South']
]
index = pd.MultiIndex.from_arrays(arrays, names=['quarter', 'region'])
 
sales = pd.DataFrame({
    'revenue': [100, 150, 120, 180, 90, 200],
    'units': [10, 15, 12, 18, 9, 20]
}, index=index)
 
print(sales)

Select by first level:

# All Q1 data
sales.loc['Q1']

Select by both levels using a tuple:

# Q2, North specifically
sales.loc[('Q2', 'North')]

Slice across levels:

# Q1 through Q2, all regions
sales.loc['Q1':'Q2']

Use pd.IndexSlice for advanced cross-section slicing:

idx = pd.IndexSlice
# All quarters, only South region
sales.loc[idx[:, 'South'], :]

loc with Callable (Lambda Filtering)

Pass a callable (typically a lambda) to .loc[]. The callable receives the DataFrame and must return a valid indexer.

# Filter using a lambda -- employees whose name starts with a vowel
vowel_names = df.loc[lambda x: x['name'].str[0].isin(['A', 'E', 'I', 'O', 'U'])]
print(vowel_names[['name', 'dept']])

Output:

       name         dept
e001  Alice  Engineering
e005    Eve    Marketing

Callables are especially useful in method chains where you cannot reference df directly:

result = (
    df
    .assign(bonus=lambda x: x['salary'] * 0.1)
    .loc[lambda x: x['bonus'] > 8000, ['name', 'salary', 'bonus']]
)
print(result)

Common Gotchas

1. Inclusive Slicing

New users familiar with Python lists or .iloc[] expect the end of a slice to be excluded. With .loc[], both endpoints are included.

# Python list: [0, 1, 2, 3, 4][1:3] gives [1, 2]  -- excludes index 3
# pandas loc: df.loc['e001':'e003'] gives e001, e002, e003 -- includes e003

2. SettingWithCopyWarning

This warning fires when you chain indexing operations. Avoid it by using .loc[] directly.

# WRONG -- chained indexing, may not update the original DataFrame
# df[df['dept'] == 'Sales']['salary'] = 75000
 
# CORRECT -- single .loc[] call
df.loc[df['dept'] == 'Sales', 'salary'] = 75000

Starting from pandas 2.0+ with Copy-on-Write enabled (the default in pandas 3.0), chained assignment silently does nothing. Use .loc[] to be safe across all versions.

3. KeyError from Missing Labels

.loc[] raises KeyError if a label does not exist in the index.

# This raises KeyError if 'e999' is not in the index
# df.loc['e999']
 
# Safe alternative: use .reindex() or check first
if 'e999' in df.index:
    row = df.loc['e999']

4. Boolean Series Must Be Aligned

The boolean array passed to .loc[] must have the same index as the DataFrame. A common mistake is filtering with a boolean array from a different DataFrame.

# Correct: boolean Series from the same DataFrame
mask = df['salary'] > 80000  # Index matches df
df.loc[mask]
 
# Incorrect: boolean array from a different-length source
# external_mask = pd.Series([True, False, True])  # length 3, df has 5 rows
# df.loc[external_mask]  # Raises IndexingError

Performance Tips

.loc[] is efficient for most workloads, but certain patterns are faster than others.

OperationTip
Single cell read/writeUse .at[] instead -- 5-10x faster than .loc[] for scalar access
Bulk boolean filter.loc[mask] is fast. Precompute mask as a variable if reusing it
Large DataFrame + complex conditionsCombine conditions into a single mask before passing to .loc[]
Iterating row by rowAvoid .loc[] in a loop. Use vectorized operations or .apply()
String column filtersUse .str accessor methods to build the mask, then pass to .loc[]
MultiIndex lookups.xs() can be faster for cross-section selection on one level

General rule: if you catch yourself writing a for-loop with .loc[] inside, there is almost always a vectorized alternative.


Real-World Example: Filtering and Updating a Sales DataFrame

Here is a realistic workflow that combines multiple .loc[] patterns.

import pandas as pd
import numpy as np
 
# Simulated sales data
np.random.seed(42)
n = 1000
sales = pd.DataFrame({
    'order_id': range(1, n + 1),
    'region': np.random.choice(['East', 'West', 'North', 'South'], n),
    'product': np.random.choice(['Widget', 'Gadget', 'Doohickey'], n),
    'quantity': np.random.randint(1, 50, n),
    'unit_price': np.random.uniform(5, 100, n).round(2),
    'status': np.random.choice(['completed', 'pending', 'cancelled'], n, p=[0.7, 0.2, 0.1])
})
sales['total'] = sales['quantity'] * sales['unit_price']
sales = sales.set_index('order_id')
 
# 1. View completed orders in the East region
east_completed = sales.loc[
    (sales['region'] == 'East') & (sales['status'] == 'completed'),
    ['product', 'quantity', 'total']
]
print(f"East completed orders: {len(east_completed)}")
 
# 2. Flag high-value orders (total > 2000)
sales.loc[sales['total'] > 2000, 'tier'] = 'premium'
sales.loc[sales['total'] <= 2000, 'tier'] = 'standard'
 
# 3. Apply a 5% discount to pending Widget orders
mask = (sales['status'] == 'pending') & (sales['product'] == 'Widget')
sales.loc[mask, 'total'] = (sales.loc[mask, 'total'] * 0.95).round(2)
 
# 4. Summary of premium orders by region
premium = sales.loc[sales['tier'] == 'premium']
print(premium.groupby('region')['total'].agg(['count', 'mean', 'sum']).round(2))

This pattern -- filter, update, re-filter -- is the bread and butter of pandas data wrangling, and .loc[] handles every step.


Visualize Your Selections with PyGWalker

After selecting and filtering data with .loc[], visual exploration helps you validate the results and find patterns faster. PyGWalker (opens in a new tab) turns any pandas DataFrame into an interactive Tableau-like interface inside Jupyter notebooks. Instead of writing separate plotting code for each question, you drag and drop columns to build charts instantly.

import pygwalker as pyg
 
# Select a subset with loc and visualize it
filtered = sales.loc[
    (sales['region'].isin(['East', 'West'])) & (sales['status'] == 'completed')
]
pyg.walk(filtered)

PyGWalker is particularly useful after .loc[] operations because you can visually cross-check that your filters returned the expected data, then explore distributions and outliers without writing additional code.


FAQ

What does pandas loc do?

pandas .loc[] is a label-based indexer for selecting rows and columns from a DataFrame using index labels, column names, boolean arrays, or callables. It returns a subset of the DataFrame matching the specified labels. Unlike .iloc[], which uses integer positions, .loc[] uses the actual names in the index and column headers.

What is the difference between loc and iloc in pandas?

.loc[] uses label-based indexing and is inclusive on both ends of a slice. .iloc[] uses integer position-based indexing and is exclusive on the end of a slice (like standard Python). For example, df.loc['a':'c'] includes rows labeled 'a', 'b', and 'c', while df.iloc[0:3] includes positions 0, 1, and 2 only. Use .loc[] when you know the label names; use .iloc[] when you know the numeric positions.

How do I filter rows with multiple conditions using loc?

Combine boolean conditions with & (and), | (or), and ~ (not), wrapping each condition in parentheses. For example: df.loc[(df['age'] > 30) & (df['city'] == 'NYC'), ['name', 'salary']]. Do not use Python's and/or keywords, as they cannot operate on pandas Series element-wise and will raise a ValueError.

How do I set values with loc without getting SettingWithCopyWarning?

Always use .loc[] directly on the original DataFrame in a single indexing operation. Write df.loc[df['col'] > 5, 'col2'] = new_value instead of df[df['col'] > 5]['col2'] = new_value. The chained form creates an intermediate copy, so the assignment may not propagate to the original DataFrame.

Can I use loc with a MultiIndex DataFrame?

Yes. Pass a tuple of labels to access specific levels: df.loc[('level1_val', 'level2_val')]. For partial indexing on the first level, pass just the first-level label: df.loc['level1_val']. For advanced cross-section slicing across levels, use pd.IndexSlice: df.loc[pd.IndexSlice[:, 'level2_val'], :].

Conclusion

pandas .loc[] is the go-to accessor for label-based DataFrame selection. It handles single-cell reads, multi-row slicing, boolean filtering with compound conditions, in-place value assignment, and MultiIndex lookups -- all with a consistent df.loc[rows, columns] syntax.

Key points to remember:

  • Slicing is inclusive on both ends, unlike .iloc[] and standard Python
  • Use .loc[] for assignment to avoid SettingWithCopyWarning
  • Wrap compound conditions in parentheses and use &, |, ~ instead of and, or, not
  • Use .at[] for single-cell access when performance matters
  • Callables enable clean filtering inside method chains

For interactive exploration of your .loc[] results, PyGWalker (opens in a new tab) lets you drag-and-drop columns to build visualizations directly from any filtered DataFrame -- no additional plotting code required.

📚