Pandas iterrows(): How to Iterate Over DataFrame Rows (And When Not To)

Q: What does pandas iterrows() return?

iterrows() returns a generator that yields (index, Series) pairs for each row in the DataFrame. The index is the row label, and the Series contains all column values for that row with column names as the Series index.

Q: Is iterrows() slow in pandas?

Yes. iterrows() is one of the slowest ways to process DataFrame rows because it creates a new pandas Series object for each row, casts all values to Python objects, and operates in a Python-level loop. It is typically 100-5000x slower than vectorized operations.

Q: What is the difference between iterrows() and itertuples()?

itertuples() returns lightweight namedtuples instead of Series objects, making it 20-30x faster than iterrows(). It also preserves column dtypes rather than casting everything to object. Use itertuples() whenever you need row-by-row iteration and performance matters.

Q: How do I modify a DataFrame while using iterrows()?

You cannot modify the original DataFrame through the row variable returned by iterrows -- it is a copy. Use df.at[index, 'column'] = value inside the loop, or better yet, build a list and assign it after the loop. The fastest approach is to avoid iteration entirely and use vectorized operations.

Q: When should I use iterrows() instead of vectorized operations?

Use iterrows when your DataFrame has fewer than ~1,000 rows and readability matters, when each row requires complex stateful logic depending on previous rows, when debugging row-by-row processing, or when each row triggers an external API call or I/O operation.

Q: Can iterrows() change column data types?

Yes. Because iterrows converts each row to a Series with a single dtype, mixed-type DataFrames will have all values cast to object dtype. Integer columns may become floats. Use itertuples() if you need type-safe iteration.

Name: Soren Atelier

Updated on 2/11/2026

Every data scientist hits the same wall. You have a pandas DataFrame, you need to process each row with some custom logic, and the first thing that comes to mind is a loop. A quick search leads you to iterrows() -- the built-in method that lets you iterate over DataFrame rows as (index, Series) pairs. It works. It reads well. And on a 100-row test dataset, it finishes instantly.

Then you run it on your actual dataset with 500,000 rows. Minutes pass. Your notebook cell is still spinning. What happened?

The problem is not that row iteration is inherently wrong. The problem is that iterrows() carries hidden overhead that makes it 100-1000x slower than the alternatives pandas was designed around. Understanding exactly what iterrows does under the hood, when it is appropriate, and what to use instead separates fast, production-ready code from notebooks that time out on real data.

This guide covers everything you need to know about iterrows(): how it works, why it is slow, and the concrete alternatives that solve the same problems in a fraction of the time.

What iterrows() Returns

The iterrows() method is a generator that yields pairs of (index, Series) for each row in a DataFrame. Each row is converted into a pandas Series object with the column names as the index.

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [28, 34, 22, 45],
    'salary': [72000, 85000, 55000, 120000],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Executive']
})
 
for index, row in df.iterrows():
    print(f"Index: {index}, Name: {row['name']}, Age: {row['age']}")

Output:

Index: 0, Name: Alice, Age: 28
Index: 1, Name: Bob, Age: 34
Index: 2, Name: Charlie, Age: 22
Index: 3, Name: Diana, Age: 45

Each row is a pandas Series:

for index, row in df.iterrows():
    print(type(row))
    print(row)
    break  # Just show the first row

Output:

<class 'pandas.core.series.Series'>
name             Alice
age                 28
salary           72000
department    Engineering
Name: 0, dtype: object

Notice the dtype: object. This is the first clue to why iterrows is slow -- but more on that shortly.

Basic iterrows() Usage Patterns

Accessing Column Values

You can access values in each row using dictionary-style bracket notation or dot notation:

import pandas as pd
 
df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'price': [999.99, 29.99, 79.99, 349.99],
    'stock': [15, 200, 85, 42]
})
 
for index, row in df.iterrows():
    # Bracket notation (recommended)
    revenue_potential = row['price'] * row['stock']
    print(f"{row['product']}: ${revenue_potential:,.2f}")

Output:

Laptop: $14,999.85
Mouse: $5,998.00
Keyboard: $6,799.15
Monitor: $14,699.58

Building a List from Row Data

A common use case is constructing a new list or dictionary from row-level computations:

import pandas as pd
 
df = pd.DataFrame({
    'first_name': ['John', 'Jane', 'Bob'],
    'last_name': ['Smith', 'Doe', 'Johnson'],
    'email_domain': ['gmail.com', 'company.org', 'outlook.com']
})
 
emails = []
for index, row in df.iterrows():
    email = f"{row['first_name'].lower()}.{row['last_name'].lower()}@{row['email_domain']}"
    emails.append(email)
 
df['email'] = emails
print(df)

Output:

  first_name last_name email_domain                     email
0       John     Smith    gmail.com     john.smith@gmail.com
1       Jane       Doe  company.org     jane.doe@company.org
2        Bob   Johnson  outlook.com  bob.johnson@outlook.com

Conditional Logic Per Row

import pandas as pd
 
df = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'math_score': [92, 67, 85, 45],
    'english_score': [78, 88, 90, 72]
})
 
results = []
for index, row in df.iterrows():
    avg = (row['math_score'] + row['english_score']) / 2
    if avg >= 85:
        results.append('Honors')
    elif avg >= 70:
        results.append('Pass')
    else:
        results.append('Needs Improvement')
 
df['status'] = results
print(df)

Output:

   student  math_score  english_score             status
0    Alice          92             78             Honors
1      Bob          67             88               Pass
2  Charlie          85             90             Honors
3    Diana          45             72  Needs Improvement

Why iterrows() Is Slow

Understanding the performance problem requires knowing what happens internally on each iteration:

1. Series Object Creation Overhead

Every single iteration creates a brand-new pandas Series object. For a DataFrame with 1 million rows, that means 1 million Series objects are allocated and garbage collected. Series creation involves memory allocation, index construction, and metadata setup -- none of which are free.

2. Type Casting to Object dtype

When iterrows converts a row into a Series, it must find a single dtype that accommodates all column types. If your DataFrame has integers, floats, and strings (which most do), the only common dtype is object. This forces numeric values to be boxed as Python objects, losing the performance benefits of NumPy's contiguous memory layout.

import pandas as pd
 
df = pd.DataFrame({
    'int_col': [1, 2, 3],
    'float_col': [1.5, 2.5, 3.5],
    'str_col': ['a', 'b', 'c']
})
 
print(f"DataFrame dtypes:\n{df.dtypes}\n")
 
for index, row in df.iterrows():
    print(f"Row dtype: {row.dtype}")
    print(f"int_col type: {type(row['int_col'])}")
    break

Output:

DataFrame dtypes:
int_col        int64
float_col    float64
str_col       object
dtype: object

Row dtype: object
int_col type: <class 'int'>

The integer column that was stored as an efficient int64 NumPy array is now a boxed Python int object. This conversion happens for every row, every iteration.

3. Python-Level Loop Overhead

pandas is built on NumPy, which operates on entire arrays in compiled C code. When you use iterrows, you abandon this advantage and process data one element at a time in the Python interpreter. The Python interpreter adds overhead for each operation: function calls, dynamic type checking, attribute lookups -- all multiplied by the number of rows.

Performance Benchmark

Here is a concrete benchmark comparing iteration approaches:

import pandas as pd
import numpy as np
import timeit
 
# Create a benchmark DataFrame
n_rows = 100_000
df = pd.DataFrame({
    'a': np.random.randn(n_rows),
    'b': np.random.randn(n_rows),
    'c': np.random.randint(1, 100, n_rows)
})
 
# Operation: compute a * b + c for each row
 
# Method 1: iterrows
def method_iterrows():
    results = []
    for idx, row in df.iterrows():
        results.append(row['a'] * row['b'] + row['c'])
    return results
 
# Method 2: itertuples
def method_itertuples():
    results = []
    for row in df.itertuples():
        results.append(row.a * row.b + row.c)
    return results
 
# Method 3: apply
def method_apply():
    return df.apply(lambda row: row['a'] * row['b'] + row['c'], axis=1)
 
# Method 4: vectorized
def method_vectorized():
    return df['a'] * df['b'] + df['c']
 
# Benchmark each method (3 runs)
for name, func in [
    ('iterrows', method_iterrows),
    ('itertuples', method_itertuples),
    ('apply', method_apply),
    ('vectorized', method_vectorized),
]:
    time = timeit.timeit(func, number=3) / 3
    print(f"{name:15s}: {time:.4f} seconds")

Typical output on a modern machine (100,000 rows):

iterrows       : 4.5200 seconds
itertuples     : 0.1580 seconds
apply          : 1.8900 seconds
vectorized     : 0.0008 seconds

Performance Comparison Table

Method	Speed (100K rows)	Memory Overhead	Type Safety	Readability
`iterrows()`	~4.5s (1x)	High (Series per row)	Poor (casts to object)	High
`itertuples()`	~0.16s (28x faster)	Low (namedtuples)	Good (preserves dtypes)	Medium
`apply(axis=1)`	~1.9s (2.4x faster)	Medium	Poor (casts to object)	High
Vectorized ops	~0.001s (5000x faster)	Minimal	Excellent	Medium
`np.where()`	~0.001s (5000x faster)	Minimal	Excellent	Medium
`np.vectorize()`	~0.08s (56x faster)	Low	Good	Medium

The key takeaway: vectorized operations are not marginally faster -- they are orders of magnitude faster. For 1 million rows, the difference is between 0.01 seconds and 45 seconds.

iterrows() vs itertuples()

If you must iterate row by row, itertuples() is almost always the better choice. Here is why:

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [28, 34, 22],
    'salary': [72000, 85000, 55000]
})
 
# iterrows: returns (index, Series)
print("=== iterrows ===")
for index, row in df.iterrows():
    print(f"Type: {type(row)}, Age type: {type(row['age'])}")
    break
 
# itertuples: returns namedtuples
print("\n=== itertuples ===")
for row in df.itertuples():
    print(f"Type: {type(row)}, Age type: {type(row.age)}")
    break

Output:

=== iterrows ===
Type: <class 'pandas.core.series.Series'>, Age type: <class 'int'>

=== itertuples ===
Type: <class 'pandas.core.frame.Pandas'>, Age type: <class 'numpy.int64'>

Key differences:

Feature	`iterrows()`	`itertuples()`
Returns	`(index, Series)`	Named tuples
Speed	Slow (Series creation overhead)	20-30x faster
dtype preservation	Casts to `object` dtype	Preserves original dtypes
Access pattern	`row['column_name']`	`row.column_name`
Index access	First element of tuple	`row.Index`
Column names with spaces	Works fine	Renamed to positional

When to choose itertuples over iterrows

import pandas as pd
 
df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Tool'],
    'price': [9.99, 24.99, 14.99],
    'quantity': [100, 50, 200]
})
 
# itertuples is faster and preserves types
for row in df.itertuples():
    revenue = row.price * row.quantity
    print(f"{row.product}: ${revenue:.2f}")
 
# Use index=False to drop the Index field
for row in df.itertuples(index=False):
    print(row)

Output:

Widget: $999.00
Gadget: $1249.50
Tool: $2998.00
Pandas(product='Widget', price=9.99, quantity=100)
Pandas(product='Gadget', price=24.99, quantity=50)
Pandas(product='Tool', price=14.99, quantity=200)

iterrows() vs apply()

Both iterrows() and apply(axis=1) process data row by row, but they differ in API design and speed:

import pandas as pd
 
df = pd.DataFrame({
    'base_price': [100, 200, 150],
    'tax_rate': [0.08, 0.10, 0.06],
    'discount': [0.05, 0.15, 0.10]
})
 
# Using iterrows
results_iterrows = []
for idx, row in df.iterrows():
    final = row['base_price'] * (1 + row['tax_rate']) * (1 - row['discount'])
    results_iterrows.append(final)
df['final_iterrows'] = results_iterrows
 
# Using apply
df['final_apply'] = df.apply(
    lambda row: row['base_price'] * (1 + row['tax_rate']) * (1 - row['discount']),
    axis=1
)
 
print(df[['final_iterrows', 'final_apply']])

Output:

   final_iterrows  final_apply
0         102.600      102.600
1         187.000      187.000
2         143.100      143.100

apply() is typically 2-3x faster than iterrows() because it avoids the tuple unpacking step, but it still has the same fundamental problem of processing one row at a time in Python. For this specific operation, the vectorized version is 1000x faster:

# Vectorized -- the right way
df['final_vectorized'] = df['base_price'] * (1 + df['tax_rate']) * (1 - df['discount'])

Vectorized Alternatives: The Right Way

For most operations people use iterrows() for, a vectorized alternative exists that runs dramatically faster.

Arithmetic Operations

import pandas as pd
 
df = pd.DataFrame({
    'price': [10.0, 20.0, 30.0, 40.0],
    'quantity': [5, 3, 8, 2],
    'tax_rate': [0.08, 0.10, 0.08, 0.12]
})
 
# SLOW: iterrows
totals = []
for idx, row in df.iterrows():
    totals.append(row['price'] * row['quantity'] * (1 + row['tax_rate']))
 
# FAST: vectorized
df['total'] = df['price'] * df['quantity'] * (1 + df['tax_rate'])
print(df)

Conditional Logic with np.where and np.select

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'score': [92, 67, 85, 45, 73, 98]
})
 
# SLOW: iterrows for conditional
grades = []
for idx, row in df.iterrows():
    if row['score'] >= 90:
        grades.append('A')
    elif row['score'] >= 80:
        grades.append('B')
    elif row['score'] >= 70:
        grades.append('C')
    else:
        grades.append('F')
 
# FAST: np.select for multiple conditions
conditions = [
    df['score'] >= 90,
    df['score'] >= 80,
    df['score'] >= 70
]
choices = ['A', 'B', 'C']
df['grade'] = np.select(conditions, choices, default='F')
print(df)

Output:

   score grade
0     92     A
1     67     F
2     85     B
3     45     F
4     73     C
5     98     A

String Operations

import pandas as pd
 
df = pd.DataFrame({
    'first': ['john', 'jane', 'bob'],
    'last': ['SMITH', 'DOE', 'JOHNSON']
})
 
# SLOW: iterrows
full_names = []
for idx, row in df.iterrows():
    full_names.append(f"{row['first'].title()} {row['last'].title()}")
 
# FAST: vectorized string operations
df['full_name'] = df['first'].str.title() + ' ' + df['last'].str.title()
print(df)

Output:

  first     last     full_name
0  john    SMITH    John Smith
1  jane      DOE      Jane Doe
2   bob  JOHNSON  Bob Johnson

Window and Rolling Calculations

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'date': pd.date_range('2026-01-01', periods=10),
    'value': [10, 12, 15, 14, 18, 20, 19, 22, 25, 23]
})
 
# SLOW: iterrows for rolling average
rolling_avg = []
for idx, row in df.iterrows():
    if idx < 2:
        rolling_avg.append(np.nan)
    else:
        avg = df.loc[idx-2:idx, 'value'].mean()
        rolling_avg.append(avg)
 
# FAST: built-in rolling
df['rolling_avg'] = df['value'].rolling(window=3).mean()
print(df)

Lookup / Mapping Operations

import pandas as pd
 
df = pd.DataFrame({
    'department_code': ['ENG', 'MKT', 'ENG', 'EXE', 'MKT']
})
 
dept_names = {
    'ENG': 'Engineering',
    'MKT': 'Marketing',
    'EXE': 'Executive'
}
 
# SLOW: iterrows
names = []
for idx, row in df.iterrows():
    names.append(dept_names.get(row['department_code'], 'Unknown'))
 
# FAST: map
df['department'] = df['department_code'].map(dept_names).fillna('Unknown')
print(df)

When iterrows() IS Appropriate

Despite its performance drawbacks, iterrows has legitimate use cases:

1. Small DataFrames (Under ~1,000 Rows)

When the dataset is small, the performance difference is negligible. If iterrows makes your code clearer, use it:

import pandas as pd
 
# Configuration table with 5 rows -- iterrows is fine
config = pd.DataFrame({
    'setting': ['timeout', 'retries', 'batch_size', 'debug', 'log_level'],
    'value': ['30', '3', '100', 'true', 'INFO']
})
 
settings = {}
for idx, row in config.iterrows():
    settings[row['setting']] = row['value']

2. Complex Stateful Logic

When each row's processing depends on the results of previous rows, vectorization becomes difficult or impossible:

import pandas as pd
 
df = pd.DataFrame({
    'transaction': ['deposit', 'withdrawal', 'deposit', 'withdrawal', 'deposit'],
    'amount': [1000, 300, 500, 200, 800]
})
 
# Running balance that depends on previous state
balance = 0
balances = []
for idx, row in df.iterrows():
    if row['transaction'] == 'deposit':
        balance += row['amount']
    else:
        balance -= row['amount']
    balances.append(balance)
 
df['balance'] = balances
print(df)

Output:

  transaction  amount  balance
0     deposit    1000     1000
1  withdrawal     300      700
2     deposit     500     1200
3  withdrawal     200     1000
4     deposit     800     1800

Note: even for this case, cumsum() with conditional signs would be faster:

import numpy as np
 
signs = np.where(df['transaction'] == 'deposit', 1, -1)
df['balance_fast'] = (df['amount'] * signs).cumsum()

3. Debugging and Exploration

When you need to inspect what is happening row by row, iterrows provides a natural debugging interface:

import pandas as pd
 
df = pd.DataFrame({
    'value': [10, -5, 'invalid', 30, None]
})
 
# Debug: find problematic rows
for idx, row in df.iterrows():
    try:
        result = float(row['value']) * 2
    except (ValueError, TypeError) as e:
        print(f"Row {idx}: Error processing '{row['value']}' -- {e}")

4. External API Calls or I/O Per Row

When each row triggers an API call, database query, or file operation, the I/O latency dwarfs the iteration overhead:

import pandas as pd
 
urls = pd.DataFrame({
    'endpoint': ['/api/users/1', '/api/users/2', '/api/users/3'],
    'method': ['GET', 'GET', 'GET']
})
 
# API calls dominate runtime -- iterrows overhead is irrelevant
# for idx, row in urls.iterrows():
#     response = requests.get(base_url + row['endpoint'])
#     # process response

Common Mistakes with iterrows()

Mistake 1: Modifying the DataFrame During Iteration

This is the most dangerous pitfall. Changes made to row do not propagate back to the original DataFrame:

import pandas as pd
 
df = pd.DataFrame({'value': [1, 2, 3]})
 
# WRONG: This does NOT modify df
for idx, row in df.iterrows():
    row['value'] = row['value'] * 10  # Modifies the copy, not df!
 
print(df)
# Output: unchanged!
#    value
# 0      1
# 1      2
# 2      3

If you need to modify the DataFrame during iteration (which you usually should not), use df.at[] or df.loc[]:

import pandas as pd
 
df = pd.DataFrame({'value': [1, 2, 3]})
 
# Works but slow -- use vectorized ops instead
for idx, row in df.iterrows():
    df.at[idx, 'value'] = row['value'] * 10
 
print(df)
# Output:
#    value
# 0     10
# 1     20
# 2     30

The correct approach:

# BEST: vectorized
df['value'] = df['value'] * 10

Mistake 2: Using iterrows When Column Types Matter

Because iterrows casts to object dtype, you can get unexpected type behavior:

import pandas as pd
 
df = pd.DataFrame({
    'int_col': [1, 2, 3],
    'float_col': [1.0, 2.0, 3.0]
})
 
for idx, row in df.iterrows():
    # int_col might be returned as float!
    print(f"int_col: {row['int_col']}, type: {type(row['int_col'])}")
    break

This can cause subtle bugs when type precision matters (e.g., comparing integer IDs).

Mistake 3: Appending to DataFrame Inside Loop

import pandas as pd
 
# TERRIBLE: Quadratic performance -- each append copies the entire DataFrame
df = pd.DataFrame(columns=['a', 'b'])
for i in range(1000):
    df = pd.concat([df, pd.DataFrame({'a': [i], 'b': [i*2]})], ignore_index=True)
 
# CORRECT: Build a list first, then create DataFrame once
rows = []
for i in range(1000):
    rows.append({'a': i, 'b': i * 2})
df = pd.DataFrame(rows)

Real-World Example: Cleaning and Transforming Survey Data

Here is a realistic scenario that combines multiple concepts:

import pandas as pd
import numpy as np
 
# Raw survey data with messy responses
survey = pd.DataFrame({
    'respondent': ['R001', 'R002', 'R003', 'R004', 'R005'],
    'age': ['25', 'thirty', '42', '19', '55+'],
    'satisfaction': [8, 9, -1, 7, 11],
    'comment': ['Great!', '', 'N/A', 'Good service', None]
})
 
# ====== APPROACH 1: iterrows (readable but slow) ======
cleaned_rows = []
for idx, row in survey.iterrows():
    clean = {}
    clean['respondent'] = row['respondent']
 
    # Parse age with error handling
    try:
        clean['age'] = int(row['age'])
    except ValueError:
        clean['age'] = np.nan
 
    # Clamp satisfaction to valid range
    sat = row['satisfaction']
    clean['satisfaction'] = sat if 1 <= sat <= 10 else np.nan
 
    # Normalize comments
    comment = row['comment']
    if pd.isna(comment) or comment.strip() in ('', 'N/A', 'n/a'):
        clean['has_comment'] = False
    else:
        clean['has_comment'] = True
 
    cleaned_rows.append(clean)
 
cleaned_df = pd.DataFrame(cleaned_rows)
print(cleaned_df)
 
# ====== APPROACH 2: vectorized (fast) ======
survey_v = survey.copy()
survey_v['age_clean'] = pd.to_numeric(survey_v['age'], errors='coerce')
survey_v['satisfaction_clean'] = survey_v['satisfaction'].where(
    survey_v['satisfaction'].between(1, 10)
)
survey_v['has_comment'] = (
    survey_v['comment'].notna() &
    ~survey_v['comment'].fillna('').str.strip().isin(['', 'N/A', 'n/a'])
)
print(survey_v[['respondent', 'age_clean', 'satisfaction_clean', 'has_comment']])

Output:

  respondent  age  satisfaction  has_comment
0       R001  25.0          8.0         True
1       R002   NaN          9.0        False
2       R003  42.0          NaN        False
3       R004  19.0          7.0         True
4       R005   NaN          NaN        False

Both approaches produce identical results. On 100,000 rows, the vectorized version runs in milliseconds while iterrows takes seconds.

Visualize Your Data with PyGWalker

After cleaning and transforming your DataFrame -- whether through iterrows for small datasets or vectorized operations for large ones -- visualizing the results helps you validate transformations and discover patterns. PyGWalker (opens in a new tab) turns any pandas DataFrame into an interactive, Tableau-style visual exploration interface directly inside Jupyter notebooks.

import pygwalker as pyg
 
# Explore your cleaned survey data interactively
walker = pyg.walk(cleaned_df)

With PyGWalker, you can drag and drop columns to build charts, filter by conditions, and explore distributions -- all without writing additional plotting code. This is especially useful when validating data cleaning pipelines, where iterrows or vectorized ops transform raw data into analysis-ready formats.

If you are working in Jupyter and want an AI-powered agent to help with data analysis tasks like these, check out RunCell (opens in a new tab) -- an AI agent built for data scientists that runs directly in your notebook environment.

Quick Reference: Choosing the Right Iteration Method

Use this decision tree to pick the fastest approach for your situation:

Can the operation be expressed as column arithmetic? Use vectorized operations (df['a'] + df['b'])
Is it conditional assignment? Use np.where() or np.select()
Is it a string operation? Use .str accessor methods
Is it a mapping/lookup? Use .map() with a dictionary
Is it a grouped aggregation? Use .groupby() with built-in aggregations
Must you iterate, and types matter? Use itertuples()
Must you iterate, and column access by name matters with spaces? Use iterrows()
Debugging or < 1000 rows? iterrows() is fine

FAQ

What does pandas iterrows() return?

iterrows() returns a generator that yields (index, Series) pairs for each row in the DataFrame. The index is the row label, and the Series contains all column values for that row with column names as the Series index.

Is iterrows() slow in pandas?

Yes. iterrows() is one of the slowest ways to process DataFrame rows because it creates a new pandas Series object for each row, casts all values to Python objects, and operates in a Python-level loop instead of compiled C code. It is typically 100-5000x slower than vectorized operations.

What is the difference between iterrows() and itertuples()?

itertuples() returns lightweight namedtuples instead of Series objects, making it 20-30x faster than iterrows(). It also preserves column dtypes rather than casting everything to object. Use itertuples() whenever you need row-by-row iteration and performance matters.

How do I modify a DataFrame while using iterrows()?

You cannot modify the original DataFrame through the row variable returned by iterrows -- it is a copy. Use df.at[index, 'column'] = value inside the loop, or better yet, build a list and assign it after the loop. The fastest approach is to avoid iteration entirely and use vectorized operations.

When should I use iterrows() instead of vectorized operations?

Use iterrows when: (1) your DataFrame has fewer than ~1,000 rows and readability matters more than speed, (2) each row requires complex stateful logic that depends on previous rows, (3) you are debugging and need to inspect row-by-row processing, or (4) each row triggers an external API call or I/O operation where latency dominates runtime.

Can iterrows() change column data types?

Yes, and this is a common source of bugs. Because iterrows converts each row to a Series with a single dtype, mixed-type DataFrames (integers and strings) will have all values cast to object dtype. Integer columns may become floats. Use itertuples() if you need type-safe iteration.

Conclusion

The pandas iterrows() method provides a straightforward way to loop over DataFrame rows, and understanding its behavior is essential for any data scientist working with pandas. However, reaching for it by default is a performance anti-pattern that slows down data pipelines by orders of magnitude.

The hierarchy of approaches is clear: vectorized operations first, then itertuples() for necessary iteration, then apply() for complex row-level functions, and iterrows() only when debugging, working with tiny datasets, or handling stateful logic that defies vectorization. When you need to filter rows based on conditions, vectorized boolean indexing is always the right choice over iteration.

Build the habit of writing vectorized code from the start. When you catch yourself writing for idx, row in df.iterrows():, pause and ask: can this be expressed as a column operation? Nine times out of ten, the answer is yes -- and the result will be cleaner, faster, and more idiomatic pandas.

Related Guides

📚