Pandas loc: Select and Filter DataFrame Rows and Columns by Label
Updated on
DataFrame.loc[] is the primary label-based indexer in pandas. It selects rows and columns by their index labels or boolean arrays, returning anything from a single scalar value to a full DataFrame subset.
Most pandas beginners start with bracket notation like df['column'] or df[0:5], then quickly hit walls: they need to select specific rows by name, filter rows while picking certain columns, or update values in-place without triggering warnings. The .loc[] accessor solves all of these problems with a single, consistent interface.
This guide covers every common .loc[] pattern -- from basic row selection to MultiIndex lookups, boolean filtering with compound conditions, and safe value assignment. Each section includes runnable code so you can test directly in your notebook.
Quick Syntax Reference
The general form of .loc[] is:
df.loc[row_selection, column_selection]Both row_selection and column_selection accept:
- A single label:
'row_a'or'col_x' - A list of labels:
['row_a', 'row_c'] - A slice of labels:
'row_a':'row_d'(inclusive on both ends) - A boolean array or Series
- A callable that takes the DataFrame/Series and returns one of the above
If you omit column_selection, .loc[] returns all columns.
loc vs iloc vs at vs iat: Comparison Table
Pandas provides four indexing accessors. Choosing the right one depends on whether you work with labels or integer positions and whether you need a single cell or a range.
| Accessor | Indexing Type | Returns | Accepts Slices | Speed | Best Use Case |
|---|---|---|---|---|---|
.loc[] | Label-based | Scalar, Series, or DataFrame | Yes (inclusive) | Good | General label-based selection and filtering |
.iloc[] | Integer position-based | Scalar, Series, or DataFrame | Yes (exclusive end) | Good | Positional selection when labels are unknown |
.at[] | Label-based | Scalar only | No | Fastest for single cell | Read/write a single value by label |
.iat[] | Integer position-based | Scalar only | No | Fastest for single cell | Read/write a single value by position |
Key difference: .loc[] slicing is inclusive on both ends (df.loc['a':'c'] includes row 'c'), while .iloc[] slicing is exclusive on the end (df.iloc[0:3] does not include index 3). This catches many newcomers off guard.
Sample DataFrame
All examples below use this DataFrame unless stated otherwise:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'dept': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'Marketing'],
'salary': [95000, 72000, 88000, 67000, 81000],
'years': [5, 3, 7, 2, 4],
'rating': [4.5, 3.8, 4.2, 3.5, 4.0]
}, index=['e001', 'e002', 'e003', 'e004', 'e005'])
print(df)Output:
name dept salary years rating
e001 Alice Engineering 95000 5 4.5
e002 Bob Marketing 72000 3 3.8
e003 Charlie Engineering 88000 7 4.2
e004 David Sales 67000 2 3.5
e005 Eve Marketing 81000 4 4.0Select a Single Row by Label
Pass one label to .loc[] to get a row as a Series:
row = df.loc['e003']
print(row)Output:
name Charlie
dept Engineering
salary 88000
years 7
rating 4.2
Name: e003, dtype: objectTo get a single-row DataFrame instead of a Series, wrap the label in a list:
row_df = df.loc[['e003']]
print(type(row_df)) # <class 'pandas.core.frame.DataFrame'>Select Multiple Rows by Label List
Pass a list of labels to select several rows:
subset = df.loc[['e001', 'e004', 'e005']]
print(subset)Output:
name dept salary years rating
e001 Alice Engineering 95000 5 4.5
e004 David Sales 67000 2 3.5
e005 Eve Marketing 81000 4 4.0The rows appear in the order you specify, not the order of the original index.
Select a Row Range by Label Slice
Label slicing with .loc[] is inclusive on both endpoints. This is different from standard Python slicing and from .iloc[].
# Returns e002, e003, and e004 -- all three
range_rows = df.loc['e002':'e004']
print(range_rows)Output:
name dept salary years rating
e002 Bob Marketing 72000 3 3.8
e003 Charlie Engineering 88000 7 4.2
e004 David Sales 67000 2 3.5Select a Single Column
Use the second argument to pick a column. The colon : in the first position means "all rows."
names = df.loc[:, 'name']
print(names)Output:
e001 Alice
e002 Bob
e003 Charlie
e004 David
e005 Eve
Name: name, dtype: objectSelect Multiple Columns
Pass a list of column names:
subset_cols = df.loc[:, ['name', 'salary', 'rating']]
print(subset_cols)Output:
name salary rating
e001 Alice 95000 4.5
e002 Bob 72000 3.8
e003 Charlie 88000 4.2
e004 David 67000 3.5
e005 Eve 81000 4.0Column slicing also works and is inclusive:
# Columns from 'dept' through 'years'
df.loc[:, 'dept':'years']Select Rows and Columns Together
Combine row and column selectors in one call:
# Specific rows and specific columns
result = df.loc[['e001', 'e003'], ['name', 'salary']]
print(result)Output:
name salary
e001 Alice 95000
e003 Charlie 88000Mix slices with lists:
# Row slice + column list
df.loc['e002':'e004', ['name', 'rating']]Boolean Indexing: Filter Rows by Condition
The most powerful use of .loc[] is filtering rows with boolean conditions while simultaneously selecting columns.
# Employees with salary above 80000
high_earners = df.loc[df['salary'] > 80000]
print(high_earners)Output:
name dept salary years rating
e001 Alice Engineering 95000 5 4.5
e003 Charlie Engineering 88000 7 4.2
e005 Eve Marketing 81000 4 4.0Filter rows and select specific columns:
# Names and ratings of high earners
df.loc[df['salary'] > 80000, ['name', 'rating']]Multiple Conditions with & and |
Combine conditions using & (and), | (or), and ~ (not). Each condition must be wrapped in parentheses.
# Engineering employees with 4+ years
senior_eng = df.loc[
(df['dept'] == 'Engineering') & (df['years'] >= 4),
['name', 'years', 'salary']
]
print(senior_eng)Output:
name years salary
e001 Alice 5 95000
e003 Charlie 7 88000Using | for "or" conditions:
# Marketing OR Sales department
mkt_sales = df.loc[
(df['dept'] == 'Marketing') | (df['dept'] == 'Sales')
]
print(mkt_sales)Using ~ for negation:
# Everyone NOT in Engineering
non_eng = df.loc[~(df['dept'] == 'Engineering')]
print(non_eng)For membership tests, .isin() is cleaner than chaining |:
df.loc[df['dept'].isin(['Marketing', 'Sales'])]Setting Values with loc
.loc[] is the recommended way to update DataFrame values. It modifies the DataFrame in-place without triggering SettingWithCopyWarning.
Set a single cell
df.loc['e004', 'salary'] = 70000
print(df.loc['e004', 'salary']) # 70000Set an entire column for matching rows
# Give a 10% raise to everyone in Engineering
df.loc[df['dept'] == 'Engineering', 'salary'] = (
df.loc[df['dept'] == 'Engineering', 'salary'] * 1.10
).astype(int)
print(df.loc[df['dept'] == 'Engineering', ['name', 'salary']])Create a new column conditionally
df.loc[df['rating'] >= 4.0, 'performance'] = 'high'
df.loc[df['rating'] < 4.0, 'performance'] = 'standard'
print(df[['name', 'rating', 'performance']])Output:
name rating performance
e001 Alice 4.5 high
e002 Bob 3.8 standard
e003 Charlie 4.2 high
e004 David 3.5 standard
e005 Eve 4.0 highUsing loc with MultiIndex
.loc[] handles hierarchical indexes naturally with tuples.
# Create a MultiIndex DataFrame
arrays = [
['Q1', 'Q1', 'Q2', 'Q2', 'Q3', 'Q3'],
['North', 'South', 'North', 'South', 'North', 'South']
]
index = pd.MultiIndex.from_arrays(arrays, names=['quarter', 'region'])
sales = pd.DataFrame({
'revenue': [100, 150, 120, 180, 90, 200],
'units': [10, 15, 12, 18, 9, 20]
}, index=index)
print(sales)Select by first level:
# All Q1 data
sales.loc['Q1']Select by both levels using a tuple:
# Q2, North specifically
sales.loc[('Q2', 'North')]Slice across levels:
# Q1 through Q2, all regions
sales.loc['Q1':'Q2']Use pd.IndexSlice for advanced cross-section slicing:
idx = pd.IndexSlice
# All quarters, only South region
sales.loc[idx[:, 'South'], :]loc with Callable (Lambda Filtering)
Pass a callable (typically a lambda) to .loc[]. The callable receives the DataFrame and must return a valid indexer.
# Filter using a lambda -- employees whose name starts with a vowel
vowel_names = df.loc[lambda x: x['name'].str[0].isin(['A', 'E', 'I', 'O', 'U'])]
print(vowel_names[['name', 'dept']])Output:
name dept
e001 Alice Engineering
e005 Eve MarketingCallables are especially useful in method chains where you cannot reference df directly:
result = (
df
.assign(bonus=lambda x: x['salary'] * 0.1)
.loc[lambda x: x['bonus'] > 8000, ['name', 'salary', 'bonus']]
)
print(result)Common Gotchas
1. Inclusive Slicing
New users familiar with Python lists or .iloc[] expect the end of a slice to be excluded. With .loc[], both endpoints are included.
# Python list: [0, 1, 2, 3, 4][1:3] gives [1, 2] -- excludes index 3
# pandas loc: df.loc['e001':'e003'] gives e001, e002, e003 -- includes e0032. SettingWithCopyWarning
This warning fires when you chain indexing operations. Avoid it by using .loc[] directly.
# WRONG -- chained indexing, may not update the original DataFrame
# df[df['dept'] == 'Sales']['salary'] = 75000
# CORRECT -- single .loc[] call
df.loc[df['dept'] == 'Sales', 'salary'] = 75000Starting from pandas 2.0+ with Copy-on-Write enabled (the default in pandas 3.0), chained assignment silently does nothing. Use .loc[] to be safe across all versions.
3. KeyError from Missing Labels
.loc[] raises KeyError if a label does not exist in the index.
# This raises KeyError if 'e999' is not in the index
# df.loc['e999']
# Safe alternative: use .reindex() or check first
if 'e999' in df.index:
row = df.loc['e999']4. Boolean Series Must Be Aligned
The boolean array passed to .loc[] must have the same index as the DataFrame. A common mistake is filtering with a boolean array from a different DataFrame.
# Correct: boolean Series from the same DataFrame
mask = df['salary'] > 80000 # Index matches df
df.loc[mask]
# Incorrect: boolean array from a different-length source
# external_mask = pd.Series([True, False, True]) # length 3, df has 5 rows
# df.loc[external_mask] # Raises IndexingErrorPerformance Tips
.loc[] is efficient for most workloads, but certain patterns are faster than others.
| Operation | Tip |
|---|---|
| Single cell read/write | Use .at[] instead -- 5-10x faster than .loc[] for scalar access |
| Bulk boolean filter | .loc[mask] is fast. Precompute mask as a variable if reusing it |
| Large DataFrame + complex conditions | Combine conditions into a single mask before passing to .loc[] |
| Iterating row by row | Avoid .loc[] in a loop. Use vectorized operations or .apply() |
| String column filters | Use .str accessor methods to build the mask, then pass to .loc[] |
| MultiIndex lookups | .xs() can be faster for cross-section selection on one level |
General rule: if you catch yourself writing a for-loop with .loc[] inside, there is almost always a vectorized alternative.
Real-World Example: Filtering and Updating a Sales DataFrame
Here is a realistic workflow that combines multiple .loc[] patterns.
import pandas as pd
import numpy as np
# Simulated sales data
np.random.seed(42)
n = 1000
sales = pd.DataFrame({
'order_id': range(1, n + 1),
'region': np.random.choice(['East', 'West', 'North', 'South'], n),
'product': np.random.choice(['Widget', 'Gadget', 'Doohickey'], n),
'quantity': np.random.randint(1, 50, n),
'unit_price': np.random.uniform(5, 100, n).round(2),
'status': np.random.choice(['completed', 'pending', 'cancelled'], n, p=[0.7, 0.2, 0.1])
})
sales['total'] = sales['quantity'] * sales['unit_price']
sales = sales.set_index('order_id')
# 1. View completed orders in the East region
east_completed = sales.loc[
(sales['region'] == 'East') & (sales['status'] == 'completed'),
['product', 'quantity', 'total']
]
print(f"East completed orders: {len(east_completed)}")
# 2. Flag high-value orders (total > 2000)
sales.loc[sales['total'] > 2000, 'tier'] = 'premium'
sales.loc[sales['total'] <= 2000, 'tier'] = 'standard'
# 3. Apply a 5% discount to pending Widget orders
mask = (sales['status'] == 'pending') & (sales['product'] == 'Widget')
sales.loc[mask, 'total'] = (sales.loc[mask, 'total'] * 0.95).round(2)
# 4. Summary of premium orders by region
premium = sales.loc[sales['tier'] == 'premium']
print(premium.groupby('region')['total'].agg(['count', 'mean', 'sum']).round(2))This pattern -- filter, update, re-filter -- is the bread and butter of pandas data wrangling, and .loc[] handles every step.
Visualize Your Selections with PyGWalker
After selecting and filtering data with .loc[], visual exploration helps you validate the results and find patterns faster. PyGWalker (opens in a new tab) turns any pandas DataFrame into an interactive Tableau-like interface inside Jupyter notebooks. Instead of writing separate plotting code for each question, you drag and drop columns to build charts instantly.
import pygwalker as pyg
# Select a subset with loc and visualize it
filtered = sales.loc[
(sales['region'].isin(['East', 'West'])) & (sales['status'] == 'completed')
]
pyg.walk(filtered)PyGWalker is particularly useful after .loc[] operations because you can visually cross-check that your filters returned the expected data, then explore distributions and outliers without writing additional code.
FAQ
What does pandas loc do?
pandas .loc[] is a label-based indexer for selecting rows and columns from a DataFrame using index labels, column names, boolean arrays, or callables. It returns a subset of the DataFrame matching the specified labels. Unlike .iloc[], which uses integer positions, .loc[] uses the actual names in the index and column headers.
What is the difference between loc and iloc in pandas?
.loc[] uses label-based indexing and is inclusive on both ends of a slice. .iloc[] uses integer position-based indexing and is exclusive on the end of a slice (like standard Python). For example, df.loc['a':'c'] includes rows labeled 'a', 'b', and 'c', while df.iloc[0:3] includes positions 0, 1, and 2 only. Use .loc[] when you know the label names; use .iloc[] when you know the numeric positions.
How do I filter rows with multiple conditions using loc?
Combine boolean conditions with & (and), | (or), and ~ (not), wrapping each condition in parentheses. For example: df.loc[(df['age'] > 30) & (df['city'] == 'NYC'), ['name', 'salary']]. Do not use Python's and/or keywords, as they cannot operate on pandas Series element-wise and will raise a ValueError.
How do I set values with loc without getting SettingWithCopyWarning?
Always use .loc[] directly on the original DataFrame in a single indexing operation. Write df.loc[df['col'] > 5, 'col2'] = new_value instead of df[df['col'] > 5]['col2'] = new_value. The chained form creates an intermediate copy, so the assignment may not propagate to the original DataFrame.
Can I use loc with a MultiIndex DataFrame?
Yes. Pass a tuple of labels to access specific levels: df.loc[('level1_val', 'level2_val')]. For partial indexing on the first level, pass just the first-level label: df.loc['level1_val']. For advanced cross-section slicing across levels, use pd.IndexSlice: df.loc[pd.IndexSlice[:, 'level2_val'], :].
Conclusion
pandas .loc[] is the go-to accessor for label-based DataFrame selection. It handles single-cell reads, multi-row slicing, boolean filtering with compound conditions, in-place value assignment, and MultiIndex lookups -- all with a consistent df.loc[rows, columns] syntax.
Key points to remember:
- Slicing is inclusive on both ends, unlike
.iloc[]and standard Python - Use
.loc[]for assignment to avoidSettingWithCopyWarning - Wrap compound conditions in parentheses and use
&,|,~instead ofand,or,not - Use
.at[]for single-cell access when performance matters - Callables enable clean filtering inside method chains
For interactive exploration of your .loc[] results, PyGWalker (opens in a new tab) lets you drag-and-drop columns to build visualizations directly from any filtered DataFrame -- no additional plotting code required.