Skip to content

Python List Comprehension: Complete Guide with Examples and Performance Tips

Updated on

You need to transform a list of values in Python. Maybe you are filtering rows from a dataset, extracting fields from a list of dictionaries, or converting strings to numbers. The classic approach is a for loop: initialize an empty list, iterate, append. It works, but it takes four lines to do what should be one. Multiply that across a data pipeline with dozens of transformations and your code becomes a wall of boilerplate that is harder to read, slower to write, and measurably slower to execute.

Python list comprehensions solve this by collapsing the loop-and-append pattern into a single, readable expression. They are not just syntactic sugar -- CPython compiles list comprehensions into optimized bytecode that runs 10-40% faster than equivalent for loops. Every experienced Python developer uses them daily. Every data science workflow depends on them. If you write Python and do not understand list comprehensions deeply, you are writing slower code than you need to.

📚

This guide covers everything about Python list comprehensions: the basic syntax, filtering with conditions, nested comprehensions, dictionary and set comprehensions, generator expressions, performance benchmarks against for loops and map(), real-world patterns for data processing, and the specific situations where list comprehensions are the wrong tool.

Basic Syntax of Python List Comprehension

A list comprehension creates a new list by applying an expression to each item in an iterable. Here is the general form:

[expression for item in iterable]

This is equivalent to:

result = []
for item in iterable:
    result.append(expression)

A simple example -- squaring every number in a list:

numbers = [1, 2, 3, 4, 5]
squares = [n ** 2 for n in numbers]
print(squares)
# [1, 4, 9, 16, 25]

The comprehension reads left to right: "give me n ** 2 for each n in numbers." There is no need to initialize an empty list, no append() call, and no extra indentation.

Here are a few more basic examples to build intuition:

# Convert strings to uppercase
names = ["alice", "bob", "charlie"]
upper_names = [name.upper() for name in names]
print(upper_names)
# ['ALICE', 'BOB', 'CHARLIE']
 
# Get the length of each word
words = ["python", "list", "comprehension"]
lengths = [len(w) for w in words]
print(lengths)
# [6, 4, 13]
 
# Extract the first character from each string
initials = [name[0] for name in ["Alice", "Bob", "Charlie"]]
print(initials)
# ['A', 'B', 'C']
 
# Create a list of tuples
pairs = [(x, x ** 2) for x in range(5)]
print(pairs)
# [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

Filtering with Conditions (if Clause)

Add an if clause to include only items that pass a condition:

[expression for item in iterable if condition]

This replaces the loop-with-conditional pattern:

# Traditional for loop with condition
result = []
for item in iterable:
    if condition:
        result.append(expression)

Practical examples:

# Keep only even numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = [n for n in numbers if n % 2 == 0]
print(evens)
# [2, 4, 6, 8, 10]
 
# Filter strings longer than 3 characters
words = ["hi", "hello", "hey", "greetings", "yo"]
long_words = [w for w in words if len(w) > 3]
print(long_words)
# ['hello', 'greetings']
 
# Extract positive numbers from mixed data
values = [10, -3, 0, 7, -1, 4, -8, 2]
positive = [v for v in values if v > 0]
print(positive)
# [10, 7, 4, 2]
 
# Filter and transform in one step
scores = [45, 82, 91, 67, 38, 95, 73]
high_scores_doubled = [s * 2 for s in scores if s >= 70]
print(high_scores_doubled)
# [164, 182, 190, 146]

if-else in List Comprehension

When you need to apply different transformations based on a condition (rather than filtering), put the conditional expression before the for:

[expression_if_true if condition else expression_if_false for item in iterable]

Note the difference: if after for filters items out; if-else before for transforms every item.

# Label numbers as even or odd
numbers = [1, 2, 3, 4, 5]
labels = ["even" if n % 2 == 0 else "odd" for n in numbers]
print(labels)
# ['odd', 'even', 'odd', 'even', 'odd']
 
# Clamp values to a range
raw = [150, -10, 75, 200, 50, -30, 100]
clamped = [max(0, min(v, 100)) for v in raw]
print(clamped)
# [100, 0, 75, 100, 50, 0, 100]
 
# Replace None with a default
data = ["Alice", None, "Charlie", None, "Eve"]
cleaned = [name if name is not None else "Unknown" for name in data]
print(cleaned)
# ['Alice', 'Unknown', 'Charlie', 'Unknown', 'Eve']

Multiple Conditions

Chain multiple conditions with and / or:

# Numbers divisible by both 2 and 3
numbers = range(1, 31)
div_by_2_and_3 = [n for n in numbers if n % 2 == 0 and n % 3 == 0]
print(div_by_2_and_3)
# [6, 12, 18, 24, 30]
 
# Strings that start with 'p' or have length > 5
words = ["python", "java", "perl", "javascript", "go", "php"]
filtered = [w for w in words if w.startswith("p") or len(w) > 5]
print(filtered)
# ['python', 'perl', 'javascript']

Nested List Comprehensions

A nested list comprehension has multiple for clauses. The order matches how you would write nested for loops:

[expression for outer in outer_iterable for inner in inner_iterable]

This is equivalent to:

result = []
for outer in outer_iterable:
    for inner in inner_iterable:
        result.append(expression)

Flattening a List of Lists

The most common use case for nested comprehensions:

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
print(flat)
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

Read it as: "for each row in matrix, for each num in row, give me num."

Generating Combinations

colors = ["red", "green", "blue"]
sizes = ["S", "M", "L"]
combos = [(color, size) for color in colors for size in sizes]
print(combos)
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
#  ('green', 'S'), ('green', 'M'), ('green', 'L'),
#  ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]

Creating a Matrix (Nested Comprehension Inside)

You can also nest a comprehension inside another to create 2D structures:

# Create a 3x4 matrix of zeros
matrix = [[0 for col in range(4)] for row in range(3)]
print(matrix)
# [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
 
# Transpose a matrix
original = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
transposed = [[row[i] for row in original] for i in range(len(original[0]))]
print(transposed)
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
 
# Multiplication table
table = [[i * j for j in range(1, 6)] for i in range(1, 6)]
for row in table:
    print(row)
# [1, 2, 3, 4, 5]
# [2, 4, 6, 8, 10]
# [3, 6, 9, 12, 15]
# [4, 8, 12, 16, 20]
# [5, 10, 15, 20, 25]

Nested Comprehension with Filtering

# Flatten and filter in one step
matrix = [[1, -2, 3], [-4, 5, -6], [7, -8, 9]]
positive_flat = [num for row in matrix for num in row if num > 0]
print(positive_flat)
# [1, 3, 5, 7, 9]

Dictionary and Set Comprehensions

Python extends the comprehension syntax beyond lists.

Dictionary Comprehension

{key_expression: value_expression for item in iterable}
# Create a dictionary from two lists
names = ["Alice", "Bob", "Charlie"]
scores = [88, 95, 72]
grade_book = {name: score for name, score in zip(names, scores)}
print(grade_book)
# {'Alice': 88, 'Bob': 95, 'Charlie': 72}
 
# Square numbers as key-value pairs
squares = {n: n ** 2 for n in range(1, 6)}
print(squares)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
 
# Filter a dictionary
prices = {"apple": 1.20, "banana": 0.50, "cherry": 2.00, "date": 3.50}
affordable = {k: v for k, v in prices.items() if v < 2.0}
print(affordable)
# {'apple': 1.2, 'banana': 0.5}
 
# Swap keys and values
original = {"a": 1, "b": 2, "c": 3}
swapped = {v: k for k, v in original.items()}
print(swapped)
# {1: 'a', 2: 'b', 3: 'c'}
 
# Word frequency counter
sentence = "the cat sat on the mat the cat"
word_freq = {word: sentence.split().count(word) for word in set(sentence.split())}
print(word_freq)
# {'mat': 1, 'on': 1, 'sat': 1, 'the': 3, 'cat': 2}

Set Comprehension

{expression for item in iterable}

Set comprehensions produce a set -- unordered, with no duplicates:

# Get unique word lengths
words = ["hello", "world", "hi", "python", "code", "hey"]
unique_lengths = {len(w) for w in words}
print(unique_lengths)
# {2, 3, 4, 5, 6}
 
# Extract unique first letters
names = ["Alice", "Anna", "Bob", "Brian", "Charlie"]
first_letters = {name[0] for name in names}
print(first_letters)
# {'A', 'B', 'C'}

List Comprehension vs Generator Expression

A generator expression looks almost identical to a list comprehension but uses parentheses instead of brackets:

# List comprehension -- builds entire list in memory
squares_list = [n ** 2 for n in range(1_000_000)]
 
# Generator expression -- produces values lazily, one at a time
squares_gen = (n ** 2 for n in range(1_000_000))

The key differences:

FeatureList Comprehension []Generator Expression ()
MemoryStores all values at onceProduces one value at a time
Speed (iterate once)Slightly faster (batched)Slightly slower (lazy overhead)
ReusableYes, iterate multiple timesNo, exhausted after one pass
Supports indexingYes (result[3])No
Best forSmall-to-medium data, random accessLarge data, streaming, pipelines

When to use each:

# Use a list comprehension when you need the full list
data = [x ** 2 for x in range(100)]
print(data[50])  # Random access works
print(sum(data))  # Can reuse
 
# Use a generator expression when you only iterate once
# Especially for large datasets or chained operations
total = sum(x ** 2 for x in range(10_000_000))  # No intermediate list
print(total)
 
# Passing a generator directly to a function (no extra parentheses needed)
max_square = max(x ** 2 for x in range(100))
has_even = any(x % 2 == 0 for x in range(10))
joined = ", ".join(str(x) for x in range(5))
print(joined)
# 0, 1, 2, 3, 4

For data pipelines that process millions of rows, generator expressions can reduce memory usage from gigabytes to kilobytes.

Comparison: List Comprehension vs For Loop vs map()

Here is how the three approaches compare for a common transformation -- converting a list of strings to integers:

string_numbers = ["1", "2", "3", "4", "5"]
 
# 1. For loop
result_loop = []
for s in string_numbers:
    result_loop.append(int(s))
 
# 2. List comprehension
result_comp = [int(s) for s in string_numbers]
 
# 3. map()
result_map = list(map(int, string_numbers))
 
# All three produce: [1, 2, 3, 4, 5]
CriteriaFor LoopList Comprehensionmap()
Lines of code3-411
ReadabilityVerbose but explicitConcise, PythonicRequires knowing map
Speed (simple transform)SlowestFastFastest (C-level)
Speed (complex transform)SlowFastSimilar to comprehension
Filtering supportManual ifBuilt-in if clauseNeeds filter() separately
DebuggingEasy (breakpoints)Harder (single expression)Harder (lazy)
Memory (with list)SameSameSame (when wrapped in list())
Pythonic?AcceptablePreferredLess common in modern Python

When to Use Each

  • List comprehension: Default choice for creating a new list from an existing iterable. Handles both transformation and filtering in one expression. Preferred by PEP 8 and the Python community.
  • For loop: Use when the loop body has side effects (writing to a file, updating a database, printing), when the logic is too complex for a single expression, or when you need try/except inside the loop.
  • map(): Use when you already have a named function and the operation is a simple one-to-one transformation with no filtering. Slightly faster than comprehensions for built-in functions like int, str, float.

Performance Benchmarks

Here are timing comparisons using Python 3.12 on a standard machine. All tests transform a list of 1,000,000 integers.

import timeit
 
n = 1_000_000
data = list(range(n))
 
# Benchmark: squaring each element
loop_time = timeit.timeit("""
result = []
for x in data:
    result.append(x ** 2)
""", globals={"data": data}, number=10)
 
comp_time = timeit.timeit("""
result = [x ** 2 for x in data]
""", globals={"data": data}, number=10)
 
map_time = timeit.timeit("""
result = list(map(lambda x: x ** 2, data))
""", globals={"data": data}, number=10)
 
print(f"For loop:           {loop_time:.3f}s")
print(f"List comprehension: {comp_time:.3f}s")
print(f"map() with lambda:  {map_time:.3f}s")

Typical results:

MethodTime (10 runs, 1M items)Relative Speed
For loop~5.2s1.0x (baseline)
List comprehension~3.8s1.37x faster
map() with lambda~4.5s1.16x faster
map() with built-in~2.1s2.48x faster

Key takeaways:

  1. List comprehensions are consistently 20-40% faster than for loops for building lists. CPython uses a LIST_APPEND bytecode instruction that avoids the attribute lookup overhead of list.append().
  2. map() with a lambda is slower than list comprehension because of the function call overhead on every element.
  3. map() with a C-implemented built-in (like int, float, str) is the fastest because there is no Python-level function call at all.
  4. For data science workloads with NumPy arrays or pandas DataFrames, vectorized operations beat all three by 10-100x.

Real-World Examples

Data Cleaning and Processing

# Clean and normalize a list of email addresses
raw_emails = ["  Alice@EXAMPLE.com ", "BOB@test.COM", " charlie@demo.org  ", ""]
clean_emails = [
    email.strip().lower()
    for email in raw_emails
    if email.strip()  # Skip empty strings
]
print(clean_emails)
# ['alice@example.com', 'bob@test.com', 'charlie@demo.org']
 
# Parse CSV-like lines into structured data
lines = [
    "Alice,88,Engineering",
    "Bob,95,Marketing",
    "Charlie,72,Engineering",
]
records = [
    {"name": parts[0], "score": int(parts[1]), "dept": parts[2]}
    for line in lines
    for parts in [line.split(",")]
]
print(records)
# [{'name': 'Alice', 'score': 88, 'dept': 'Engineering'}, ...]

Working with Files

# Read non-empty, non-comment lines from a config file
with open("config.txt") as f:
    settings = [
        line.strip()
        for line in f
        if line.strip() and not line.strip().startswith("#")
    ]
 
# Get all .py files from a directory listing
import os
 
py_files = [
    f for f in os.listdir(".")
    if f.endswith(".py") and os.path.isfile(f)
]

Working with JSON and API Data

# Extract specific fields from API response
users = [
    {"id": 1, "name": "Alice", "active": True},
    {"id": 2, "name": "Bob", "active": False},
    {"id": 3, "name": "Charlie", "active": True},
]
 
active_names = [user["name"] for user in users if user["active"]]
print(active_names)
# ['Alice', 'Charlie']
 
# Create a lookup dictionary from a list of records
user_lookup = {user["id"]: user["name"] for user in users}
print(user_lookup)
# {1: 'Alice', 2: 'Bob', 3: 'Charlie'}

Data Science with Pandas

List comprehensions work well alongside pandas for tasks that do not fit neatly into vectorized operations:

import pandas as pd
 
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "scores": ["88,92,95", "70,85,90", "60,75,80", "95,98,100"]
})
 
# Parse comma-separated scores into lists
df["score_list"] = [list(map(int, s.split(","))) for s in df["scores"]]
 
# Calculate average from the parsed lists
df["average"] = [sum(scores) / len(scores) for scores in df["score_list"]]
 
print(df[["name", "average"]])
#       name    average
# 0    Alice  91.666667
# 1      Bob  81.666667
# 2  Charlie  71.666667
# 3    Diana  97.666667

If you work with DataFrames interactively and want to visualize these results without leaving your notebook, PyGWalker (opens in a new tab) turns any pandas DataFrame into a Tableau-like interactive visualization interface. You can drag and drop columns to explore patterns -- useful when your list comprehension generates a new column and you want to immediately see how the data distributes.

import pygwalker as pyg
 
# After your list comprehension creates new columns, explore visually
walker = pyg.walk(df)

When NOT to Use List Comprehensions

List comprehensions are powerful, but they have clear limits. Using them in the wrong context produces code that is harder to read, debug, and maintain.

1. Complex Logic That Spans Multiple Steps

If the expression or conditions are hard to read on one line, use a for loop:

# Bad -- too much logic crammed into one expression
result = [
    transform(x).upper().strip()
    for x in data
    if validate(x) and x.category in allowed_categories and x.date > cutoff
]
 
# Better -- explicit loop with clear variable names
result = []
for x in data:
    if not validate(x):
        continue
    if x.category not in allowed_categories:
        continue
    if x.date <= cutoff:
        continue
    cleaned = transform(x).upper().strip()
    result.append(cleaned)

2. Side Effects (Printing, Writing, Mutating)

List comprehensions should create data, not perform actions:

# Bad -- using comprehension for side effects
[print(x) for x in range(10)]  # Creates a useless list of None values
 
# Good -- use a for loop for side effects
for x in range(10):
    print(x)

3. Deeply Nested Comprehensions

More than two levels of nesting becomes unreadable:

# Bad -- triple-nested comprehension
result = [x for group in data for subgroup in group for x in subgroup if x > 0]
 
# Better -- break it into steps or use a function
def extract_positive(data):
    result = []
    for group in data:
        for subgroup in group:
            for x in subgroup:
                if x > 0:
                    result.append(x)
    return result

4. Error Handling Required

You cannot put try/except inside a list comprehension:

# This does not work
# values = [int(x) for x in data try except 0]
 
# Use a helper function or a for loop
def safe_int(x, default=0):
    try:
        return int(x)
    except (ValueError, TypeError):
        return default
 
values = [safe_int(x) for x in ["1", "two", "3", None, "5"]]
print(values)
# [1, 0, 3, 0, 5]

5. Large Data That Does Not Fit in Memory

A list comprehension builds the entire result in memory. For very large datasets, use a generator expression or itertools instead:

# Bad -- creates a 10-million-element list just to sum it
total = sum([x ** 2 for x in range(10_000_000)])  # Wastes memory
 
# Good -- generator expression uses almost no memory
total = sum(x ** 2 for x in range(10_000_000))

Common Mistakes and Pitfalls

Confusing the Order of for and if

# WRONG: if-else after for does not work
# result = [x for x in data if x > 0 else 0]  # SyntaxError
 
# RIGHT: if-else goes BEFORE for (ternary expression)
result = [x if x > 0 else 0 for x in data]
 
# RIGHT: simple filter goes AFTER for
result = [x for x in data if x > 0]

Forgetting That Comprehensions Create New Lists

# This does NOT modify the original list
original = [1, 2, 3, 4, 5]
[x * 2 for x in original]  # Creates a new list, then discards it
print(original)
# [1, 2, 3, 4, 5]  -- unchanged
 
# Assign the result to actually use it
doubled = [x * 2 for x in original]

Variable Scope Leaking (Python 2 vs 3)

In Python 2, the loop variable from a list comprehension leaked into the enclosing scope. Python 3 fixed this -- the loop variable stays inside the comprehension:

# Python 3: x does not leak
squares = [x ** 2 for x in range(5)]
# print(x)  # NameError: name 'x' is not defined

Using a Mutable Default in Nested Comprehension

# Bug: all rows share the same inner list object
# BAD
row = [0] * 3
matrix = [row for _ in range(3)]
matrix[0][0] = 1
print(matrix)
# [[1, 0, 0], [1, 0, 0], [1, 0, 0]]  -- all rows changed!
 
# GOOD: create a new list for each row
matrix = [[0] * 3 for _ in range(3)]
matrix[0][0] = 1
print(matrix)
# [[1, 0, 0], [0, 0, 0], [0, 0, 0]]  -- only first row changed

Using List Comprehensions with RunCell in Jupyter

When you are building data pipelines in Jupyter notebooks, list comprehensions are everywhere -- from data cleaning to feature engineering. But as comprehensions grow more complex, it can be hard to know whether to refactor into a loop, switch to a generator, or use a vectorized pandas operation instead.

RunCell (opens in a new tab) is an AI agent that works directly inside Jupyter notebooks. It sees your variables, DataFrames, and imports, and provides context-aware suggestions. For list comprehensions specifically, RunCell can:

  • Suggest comprehension patterns. Describe what you need ("filter the DataFrame column where values are above the mean and normalize the rest") and RunCell generates the right comprehension or vectorized alternative.
  • Refactor complex comprehensions. When a nested comprehension becomes unreadable, RunCell can split it into clearer steps or convert it to a named function.
  • Recommend generators vs lists. RunCell can detect when a list comprehension creates unnecessary intermediate data and suggest a generator expression or itertools pattern instead.
  • Benchmark alternatives. Not sure if your comprehension is faster than a for loop for your specific dataset? RunCell can set up a quick timeit comparison right in the notebook.

FAQ

What is a Python list comprehension?

A Python list comprehension is a concise syntax for creating lists from existing iterables. It combines a for loop and an optional if condition into a single expression: [expression for item in iterable if condition]. List comprehensions are faster than equivalent for loops because CPython optimizes them with dedicated bytecode.

Is list comprehension faster than a for loop in Python?

Yes. List comprehensions are typically 20-40% faster than equivalent for loops that build a list with append(). The speed advantage comes from CPython's internal optimization: list comprehensions use a LIST_APPEND bytecode instruction that avoids the repeated attribute lookup of list.append(). For built-in operations, map() can be even faster, but list comprehensions win when filtering is involved.

Can I use if-else in a list comprehension?

Yes, but the placement matters. For conditional transformation (keeping all elements), put if-else before for: [x if x > 0 else 0 for x in data]. For filtering (removing elements), put if after for: [x for x in data if x > 0]. You cannot combine both syntaxes in the same clause without careful nesting.

What is the difference between a list comprehension and a generator expression?

A list comprehension uses square brackets [] and builds the entire list in memory. A generator expression uses parentheses () and produces values lazily, one at a time. Use list comprehensions when you need random access or will iterate multiple times. Use generator expressions for large datasets where you only need to iterate once, as they use almost no memory.

When should I avoid list comprehensions?

Avoid list comprehensions when the logic is complex (multiple conditions, transformations, or error handling), when you need side effects (printing, writing files), when nesting goes beyond two levels, or when the dataset is large enough that building the full list wastes memory. In these cases, use a regular for loop, a generator expression, or a named function for clarity.

Can I nest list comprehensions?

Yes. Nested list comprehensions use multiple for clauses: [x for sublist in nested for x in sublist]. This is useful for flattening lists of lists or generating combinations. However, nesting beyond two levels hurts readability. If a comprehension needs three or more for clauses, refactor it into a regular loop or a helper function.

Conclusion

Python list comprehensions are one of the language's most useful features. They replace the verbose loop-initialize-append pattern with a single expression that is both easier to read and faster to execute. The syntax scales from simple transformations ([x * 2 for x in data]) to filtered mappings ([x for x in data if x > 0]) to nested flattening ([x for row in matrix for x in row]), and extends naturally to dictionary and set comprehensions.

The rules for using them well:

  • Use list comprehensions as the default for building new lists from iterables.
  • Use generator expressions when memory matters and you only iterate once.
  • Use for loops when you need side effects, error handling, or complex multi-step logic.
  • Use map() with existing named/built-in functions for simple one-to-one transforms.
  • Stop at two levels of nesting. Beyond that, refactor into a function or loop.
  • Never use a comprehension for side effects. If you are not using the return value, use a loop.

Master these patterns and you will write Python code that is cleaner, faster, and more idiomatic -- whether you are cleaning data in a Jupyter notebook, building an API, or processing files in a script.

📚