What is the difference between a generator and a regular function in Python?

A regular function uses the return statement to send back a single value and terminates execution. A generator function uses the yield keyword to produce a sequence of values over time, pausing execution after each yield and resuming when the next value is requested. Generators are memory-efficient because they generate values on-demand rather than storing the entire sequence in memory. They return a generator object that implements the iterator protocol with __iter__() and __next__() methods.

Can you reuse a generator after iterating through it once?

No, generators cannot be reused after exhaustion. Once you iterate through all values in a generator, it raises StopIteration and cannot be reset. If you need to iterate multiple times, you have two options: convert the generator to a list (if it fits in memory) using list(generator), or recreate the generator by calling the generator function again. This single-use characteristic is a fundamental property of generators that makes them memory-efficient but not suitable for scenarios requiring multiple passes over the data.

How much memory can generators save compared to lists?

Generators can save 99%+ memory compared to lists for large datasets. A generator producing 1 million numbers uses approximately 112 bytes regardless of sequence length, while a list of 1 million integers consumes around 8MB. The memory savings increase proportionally with dataset size—a generator for 100 million items still uses ~112 bytes while the equivalent list would require ~800MB. This is because generators use lazy evaluation, producing values one at a time only when requested, while lists store all values in memory simultaneously.

What are generator expressions and how do they differ from list comprehensions?

Generator expressions use parentheses syntax (x for x in iterable) while list comprehensions use brackets [x for x in iterable]. Generator expressions return a generator object that produces values lazily on-demand, while list comprehensions create and store the entire list in memory immediately. Generator expressions are more memory-efficient and faster for large datasets, but the resulting generator can only be iterated once and doesn't support len(), indexing, or other list operations. Use generator expressions when you need to iterate through values once; use list comprehensions when you need random access, multiple iterations, or list-specific operations.

When should I use generators instead of lists in my Python code?

Use generators when: (1) Processing large datasets that would consume too much memory as a list, (2) Working with data streams or infinite sequences, (3) Building data processing pipelines where intermediate results don't need to be stored, (4) Reading large files line-by-line, (5) You only need to iterate through values once. Use lists when: (1) The dataset is small enough to fit comfortably in memory, (2) You need to access elements by index or perform random access, (3) You need to iterate multiple times, (4) You need to know the length with len(), (5) You need to modify, sort, or reverse the sequence. The general rule is: prefer generators for large-scale data processing and lists for small datasets requiring multiple operations.

What is yield from and when should I use it?

The yield from statement delegates iteration to another generator or iterable, yielding all its values. Instead of writing a loop with yield inside (for item in other_gen: yield item), you simply write yield from other_gen. This is particularly useful for: (1) Flattening nested structures recursively, (2) Composing generators by delegating to sub-generators, (3) Implementing generator-based coroutines that need to delegate to other coroutines. Beyond syntactic convenience, yield from properly handles the generator protocol including return values and exceptions, making it essential for complex generator patterns and coroutine-style programming.

How do I chain multiple generators together for data processing pipelines?

Chain generators by passing one generator as input to another generator function, creating a pipeline where each stage processes data lazily. For example: stage1 = read_data(); stage2 = filter_data(stage1); stage3 = transform_data(stage2); results = aggregate_data(stage3). Each generator function takes an iterable as input and yields processed values. This pattern is memory-efficient because only one item is in memory at each stage, rather than storing intermediate results. The pipeline doesn't execute until you iterate through the final generator, enabling efficient processing of datasets larger than available memory.

Python Generators: Complete Guide to yield, Generator Expressions, and Lazy Evaluation

Name: Soren Atelier

Updated on 2/11/2026

Processing a 10GB log file or streaming millions of database records can bring your Python application to its knees. The traditional approach of loading all data into memory at once leads to performance bottlenecks, memory errors, and frustrated users. This is where Python generators become essential—they enable you to process massive datasets with minimal memory footprint by generating values on-demand rather than storing everything upfront.

What Are Python Generators and Why They Matter

Generators are special functions that produce a sequence of values over time rather than computing and returning them all at once. Unlike regular functions that use return to send back a single result, generators use the yield keyword to produce a series of values, pausing execution between each value and resuming when the next value is requested.

The fundamental advantage of generators is lazy evaluation—values are generated only when needed. This provides two critical benefits:

Memory efficiency: Generators don't store the entire sequence in memory. A generator producing a billion numbers consumes the same memory as one producing ten numbers.
Performance: Processing can start immediately on the first yielded value without waiting for the entire dataset to be prepared.

Here's a simple comparison illustrating the difference:

# Traditional approach - loads entire list into memory
def get_squares_list(n):
    result = []
    for i in range(n):
        result.append(i * i)
    return result
 
# Generator approach - produces values one at a time
def get_squares_generator(n):
    for i in range(n):
        yield i * i
 
# Memory impact comparison
import sys
 
# List approach
squares_list = get_squares_list(1000000)
print(f"List memory: {sys.getsizeof(squares_list):,} bytes")  # ~8,000,000 bytes
 
# Generator approach
squares_gen = get_squares_generator(1000000)
print(f"Generator memory: {sys.getsizeof(squares_gen):,} bytes")  # ~112 bytes

The memory difference is staggering—the generator uses 99.999% less memory than the list for this example. This difference compounds dramatically with larger datasets.

The yield Keyword: Heart of Generator Functions

The yield keyword is what transforms a regular function into a generator function. When Python encounters yield, it knows to return a generator object instead of executing the function immediately.

def countdown(n):
    print(f"Starting countdown from {n}")
    while n > 0:
        yield n
        n -= 1
    print("Countdown complete!")
 
# Creating the generator doesn't execute the function
gen = countdown(3)
print(type(gen))  # <class 'generator'>
 
# Values are produced on-demand
print(next(gen))  # Starting countdown from 3 -> 3
print(next(gen))  # 2
print(next(gen))  # 1
# next(gen)  # Countdown complete! -> Raises StopIteration

Key behaviors to understand:

Execution pauses at each yield statement and resumes from that exact point on the next call
Local variables maintain their state between yield calls
StopIteration exception is raised when the generator function returns (runs out of values)

Multiple yield statements can appear in a single generator:

def data_pipeline():
    # Phase 1: Loading
    yield "Loading data..."
 
    # Phase 2: Processing
    yield "Processing records..."
 
    # Phase 3: Validation
    yield "Validating results..."
 
    # Phase 4: Complete
    yield "Pipeline complete!"
 
for status in data_pipeline():
    print(status)

Generator Protocol: Understanding iter() and next()

Generators implement the iterator protocol through two special methods:

__iter__(): Returns the iterator object itself (the generator)
__next__(): Returns the next value from the generator

This makes generators perfect for use in for loops and other iteration contexts. Understanding this protocol helps clarify how generators work under the hood:

def simple_gen():
    yield 1
    yield 2
    yield 3
 
gen = simple_gen()
 
# These are equivalent
print(gen.__next__())  # 1
print(next(gen))       # 2
 
# for loops call __next__() automatically until StopIteration
for value in simple_gen():
    print(value)  # 1, 2, 3

You can also manually implement the iterator protocol to create generator-like behavior:

class CountDown:
    def __init__(self, start):
        self.current = start
 
    def __iter__(self):
        return self
 
    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current + 1
 
# Behaves like a generator
for num in CountDown(3):
    print(num)  # 3, 2, 1

However, generator functions are much more concise and readable than manual iterator classes.

Generator Expressions vs List Comprehensions

Generator expressions provide a concise syntax for creating generators, similar to list comprehensions but with parentheses instead of brackets:

# List comprehension - creates entire list in memory
squares_list = [x * x for x in range(10)]
print(type(squares_list))  # <class 'list'>
print(squares_list)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
 
# Generator expression - creates generator object
squares_gen = (x * x for x in range(10))
print(type(squares_gen))  # <class 'generator'>
print(squares_gen)  # <generator object at 0x...>
 
# Consume the generator
print(list(squares_gen))  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Syntax comparison:

Feature	List Comprehension	Generator Expression
Syntax	`[expr for item in iterable]`	`(expr for item in iterable)`
Returns	List object	Generator object
Memory	Stores all values	Generates on-demand
Speed	Faster for small datasets	Faster for large datasets
Reusable	Yes (can iterate multiple times)	No (exhausted after one iteration)

Practical example showing memory difference:

import sys
 
# List comprehension for 1 million numbers
list_comp = [x for x in range(1000000)]
print(f"List comprehension: {sys.getsizeof(list_comp):,} bytes")
 
# Generator expression for the same range
gen_exp = (x for x in range(1000000))
print(f"Generator expression: {sys.getsizeof(gen_exp):,} bytes")
 
# Output:
# List comprehension: 8,000,056 bytes
# Generator expression: 112 bytes

Generator expressions are ideal when you only need to iterate through values once and want to minimize memory usage.

yield from: Delegating to Sub-Generators

The yield from statement simplifies delegating to sub-generators or other iterables. Instead of manually looping and yielding each value, yield from handles this automatically:

# Without yield from
def get_numbers_manual():
    for i in range(3):
        yield i
    for i in range(10, 13):
        yield i
 
# With yield from
def get_numbers_delegated():
    yield from range(3)
    yield from range(10, 13)
 
print(list(get_numbers_manual()))      # [0, 1, 2, 10, 11, 12]
print(list(get_numbers_delegated()))   # [0, 1, 2, 10, 11, 12]

This is particularly useful for flattening nested structures:

def flatten(nested_list):
    for item in nested_list:
        if isinstance(item, list):
            yield from flatten(item)  # Recursive delegation
        else:
            yield item
 
nested = [1, [2, 3, [4, 5]], 6, [7, [8, 9]]]
print(list(flatten(nested)))  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

yield from also properly handles exceptions and return values from sub-generators, making it essential for complex generator pipelines.

Advanced: send() and throw() Methods

Generators can be more than just value producers—they can also receive values and handle exceptions through the send() and throw() methods, enabling coroutine-style bidirectional communication.

Using send() to Send Values into Generators

def running_average():
    total = 0
    count = 0
    average = None
 
    while True:
        value = yield average  # Yield current average, receive new value
        total += value
        count += 1
        average = total / count
 
# Create generator
avg = running_average()
next(avg)  # Prime the generator (advance to first yield)
 
# Send values and receive running averages
print(avg.send(10))   # 10.0
print(avg.send(20))   # 15.0
print(avg.send(30))   # 20.0
print(avg.send(40))   # 25.0

The send() method both sends a value into the generator (which becomes the result of the yield expression) and advances execution to the next yield.

Using throw() to Inject Exceptions

def error_handling_gen():
    try:
        while True:
            value = yield
            print(f"Received: {value}")
    except ValueError as e:
        print(f"Caught ValueError: {e}")
        yield "Recovered from error"
    except GeneratorExit:
        print("Generator is closing")
 
gen = error_handling_gen()
next(gen)  # Prime the generator
 
gen.send(10)              # Received: 10
gen.send(20)              # Received: 20
result = gen.throw(ValueError, "Invalid value")  # Caught ValueError: Invalid value
print(result)             # Recovered from error
gen.close()               # Generator is closing

These advanced features are particularly useful for implementing state machines, coroutines, and complex asynchronous patterns.

Infinite Generators: Endless Sequences

Generators excel at producing infinite sequences because they never need to materialize the entire sequence in memory:

# Infinite counter
def count_from(start=0, step=1):
    current = start
    while True:
        yield current
        current += step
 
# Fibonacci sequence
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b
 
# Cycling through a sequence
def cycle(iterable):
    saved = []
    for item in iterable:
        yield item
        saved.append(item)
    while saved:
        for item in saved:
            yield item
 
# Usage examples
counter = count_from(10, 2)
for _ in range(5):
    print(next(counter))  # 10, 12, 14, 16, 18
 
fib = fibonacci()
print([next(fib) for _ in range(10)])  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
 
colors = cycle(['red', 'green', 'blue'])
print([next(colors) for _ in range(8)])  # ['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green']

Infinite generators are particularly useful for event streams, continuous monitoring, and stateful iteration patterns.

Chaining Generators: Building Data Processing Pipelines

One of the most powerful patterns with generators is chaining them together to create efficient data processing pipelines. Each stage processes data lazily and passes results to the next stage without storing intermediate results:

# Stage 1: Read lines from a file (generator)
def read_log_file(filename):
    with open(filename, 'r') as f:
        for line in f:
            yield line.strip()
 
# Stage 2: Filter lines containing 'ERROR'
def filter_errors(lines):
    for line in lines:
        if 'ERROR' in line:
            yield line
 
# Stage 3: Extract timestamp and message
def parse_error_lines(lines):
    for line in lines:
        parts = line.split(' - ')
        if len(parts) >= 2:
            yield {'timestamp': parts[0], 'message': parts[1]}
 
# Stage 4: Count errors by hour
def group_by_hour(errors):
    from collections import defaultdict
    hourly_counts = defaultdict(int)
 
    for error in errors:
        hour = error['timestamp'][:13]  # Extract hour portion
        hourly_counts[hour] += 1
 
    return hourly_counts
 
# Build pipeline
log_lines = read_log_file('app.log')
error_lines = filter_errors(log_lines)
parsed_errors = parse_error_lines(error_lines)
results = group_by_hour(parsed_errors)
 
print(results)

This pipeline processes a potentially huge log file with minimal memory usage—only one line is in memory at any time until the final aggregation stage.

Another example with data transformation:

# Pipeline: numbers -> square -> filter evens -> sum
def square_numbers(numbers):
    for n in numbers:
        yield n * n
 
def filter_even(numbers):
    for n in numbers:
        if n % 2 == 0:
            yield n
 
# Chain the pipeline
numbers = range(1, 11)  # 1-10
squared = square_numbers(numbers)
evens = filter_even(squared)
result = sum(evens)  # Only even squares
 
print(result)  # 220 (4 + 16 + 36 + 64 + 100)

Memory Comparison: Generator vs List Benchmark

Let's conduct a real-world memory and performance benchmark to quantify the benefits of generators:

import sys
import time
import tracemalloc
 
def process_with_list(n):
    """Traditional approach using lists"""
    tracemalloc.start()
    start_time = time.time()
 
    # Create list of squares
    squares = [x * x for x in range(n)]
 
    # Filter even squares
    even_squares = [x for x in squares if x % 2 == 0]
 
    # Sum results
    result = sum(even_squares)
 
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    elapsed = time.time() - start_time
 
    return result, peak / 1024 / 1024, elapsed  # Convert to MB
 
def process_with_generator(n):
    """Generator approach"""
    tracemalloc.start()
    start_time = time.time()
 
    # Generator pipeline
    squares = (x * x for x in range(n))
    even_squares = (x for x in squares if x % 2 == 0)
    result = sum(even_squares)
 
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    elapsed = time.time() - start_time
 
    return result, peak / 1024 / 1024, elapsed
 
# Benchmark with 1 million numbers
n = 1000000
 
list_result, list_memory, list_time = process_with_list(n)
gen_result, gen_memory, gen_time = process_with_generator(n)
 
print(f"Results match: {list_result == gen_result}")
print(f"\nList approach:")
print(f"  Memory: {list_memory:.2f} MB")
print(f"  Time: {list_time:.4f} seconds")
print(f"\nGenerator approach:")
print(f"  Memory: {gen_memory:.2f} MB")
print(f"  Time: {gen_time:.4f} seconds")
print(f"\nMemory savings: {((list_memory - gen_memory) / list_memory * 100):.1f}%")

Typical output:

Results match: True

List approach:
  Memory: 36.21 MB
  Time: 0.0892 seconds

Generator approach:
  Memory: 0.12 MB
  Time: 0.0624 seconds

Memory savings: 99.7%

The generator approach uses 99.7% less memory and runs 30% faster—a dramatic improvement that compounds with larger datasets.

The itertools Module: Generator Utilities

Python's itertools module—part of the broader collections and data-structure toolkit—provides a collection of powerful generator-based tools for efficient iteration. These utilities are written in C and highly optimized:

Essential itertools Functions

import itertools
 
# chain - concatenate multiple iterables
combined = itertools.chain([1, 2], [3, 4], [5, 6])
print(list(combined))  # [1, 2, 3, 4, 5, 6]
 
# islice - slice an iterable (like list slicing but for generators)
numbers = itertools.count()  # Infinite counter: 0, 1, 2, 3...
first_ten = itertools.islice(numbers, 10)
print(list(first_ten))  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 
# count - infinite counter with start and step
counter = itertools.count(start=10, step=2)
print([next(counter) for _ in range(5)])  # [10, 12, 14, 16, 18]
 
# cycle - infinite repetition of an iterable
colors = itertools.cycle(['red', 'green', 'blue'])
print([next(colors) for _ in range(7)])  # ['red', 'green', 'blue', 'red', 'green', 'blue', 'red']
 
# accumulate - cumulative sums or other operations
numbers = [1, 2, 3, 4, 5]
cumulative = itertools.accumulate(numbers)
print(list(cumulative))  # [1, 3, 6, 10, 15]
 
# accumulate with custom function
import operator
products = itertools.accumulate(numbers, operator.mul)
print(list(products))  # [1, 2, 6, 24, 120]
 
# groupby - group consecutive elements by key
data = [('A', 1), ('A', 2), ('B', 3), ('B', 4), ('C', 5)]
for key, group in itertools.groupby(data, key=lambda x: x[0]):
    print(f"{key}: {list(group)}")
# A: [('A', 1), ('A', 2)]
# B: [('B', 3), ('B', 4)]
# C: [('C', 5)]

Practical itertools Combinations

# Paginating results with islice
def paginate(iterable, page_size):
    iterator = iter(iterable)
    while True:
        page = list(itertools.islice(iterator, page_size))
        if not page:
            break
        yield page
 
# Usage
data = range(25)
for page_num, page in enumerate(paginate(data, 10), 1):
    print(f"Page {page_num}: {page}")
# Page 1: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Page 2: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
# Page 3: [20, 21, 22, 23, 24]
 
# Windowed iteration (sliding window)
def window(iterable, size):
    it = iter(iterable)
    win = list(itertools.islice(it, size))
    if len(win) == size:
        yield tuple(win)
    for item in it:
        win = win[1:] + [item]
        yield tuple(win)
 
print(list(window([1, 2, 3, 4, 5], 3)))
# [(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Real-World Use Cases

Reading Large Files Line by Line

def process_large_csv(filename):
    """Process a multi-GB CSV file efficiently"""
    with open(filename, 'r') as f:
        # Skip header
        next(f)
 
        for line in f:
            # Parse and yield record
            fields = line.strip().split(',')
            yield {
                'user_id': fields[0],
                'action': fields[1],
                'timestamp': fields[2]
            }
 
# Process millions of records with minimal memory
for record in process_large_csv('user_events.csv'):
    # Process one record at a time
    if record['action'] == 'purchase':
        print(f"Purchase by user {record['user_id']}")

Streaming Data Processing

def stream_api_data(url, batch_size=100):
    """Stream paginated API data without loading all results"""
    offset = 0
 
    while True:
        response = requests.get(url, params={'offset': offset, 'limit': batch_size})
        data = response.json()
 
        if not data:
            break
 
        for item in data:
            yield item
 
        offset += batch_size
 
# Process unlimited API results
for item in stream_api_data('https://api.example.com/records'):
    process_item(item)

Database Query Result Iteration

def fetch_users_batch(cursor, batch_size=1000):
    """Fetch database records in batches without loading all into memory"""
    while True:
        results = cursor.fetchmany(batch_size)
        if not results:
            break
        for row in results:
            yield row
 
# Database query
cursor.execute("SELECT * FROM users WHERE active = 1")
 
# Process millions of users efficiently
for user in fetch_users_batch(cursor):
    send_email(user['email'], generate_report(user))

ETL Pipeline Example

# Extract: Read from source
def extract_from_csv(filename):
    with open(filename, 'r') as f:
        for line in f:
            yield line.strip().split(',')
 
# Transform: Clean and convert data
def transform_records(records):
    for record in records:
        yield {
            'id': int(record[0]),
            'name': record[1].title(),
            'email': record[2].lower(),
            'age': int(record[3]) if record[3] else None
        }
 
# Load: Write to database
def load_to_database(records, db_connection):
    for record in records:
        db_connection.execute(
            "INSERT INTO users VALUES (?, ?, ?, ?)",
            (record['id'], record['name'], record['email'], record['age'])
        )
        yield record  # Pass through for logging
 
# Build ETL pipeline
raw_data = extract_from_csv('users.csv')
transformed = transform_records(raw_data)
loaded = load_to_database(transformed, db_conn)
 
# Execute pipeline and count processed records
processed_count = sum(1 for _ in loaded)
print(f"Processed {processed_count} records")

Generator Best Practices and Common Pitfalls

Best Practices

Use generator expressions for simple cases

# Simple transformation - use generator expression
squares = (x * x for x in range(1000))
 
# Complex logic - use generator function
def complex_processing(data):
    for item in data:
        # Multi-step processing
        result = step1(item)
        result = step2(result)
        if validate(result):
            yield result

Chain generators for data pipelines

# Each stage processes lazily
data = read_source()
filtered = filter_stage(data)
transformed = transform_stage(filtered)
results = aggregate_stage(transformed)

Use yield from for delegation

def process_all_files(directory):
    for filename in os.listdir(directory):
        yield from process_file(filename)

Common Pitfalls

Generators are exhausted after one iteration

gen = (x for x in range(3))
print(list(gen))  # [0, 1, 2]
print(list(gen))  # [] - exhausted!
 
# Solution: Convert to list or recreate generator
data = list(gen)  # If data fits in memory
# OR
gen = (x for x in range(3))  # Recreate

Generators don't support len() or indexing

gen = (x for x in range(10))
# len(gen)  # TypeError
# gen[5]    # TypeError
 
# Solution: Convert to list if you need these operations
items = list(gen)
print(len(items))
print(items[5])

Be careful with generator scope and closure

# Wrong - all generators will use final value of i
generators = [lambda: i for i in range(3)]
print([g() for g in generators])  # [2, 2, 2]
 
# Correct - capture i in default argument
generators = [lambda i=i: i for i in range(3)]
print([g() for g in generators])  # [0, 1, 2]

Exception handling in generator chains

def stage1():
    for i in range(5):
        if i == 3:
            raise ValueError("Error in stage1")
        yield i
 
def stage2(data):
    try:
        for item in data:
            yield item * 2
    except ValueError as e:
        print(f"Caught: {e}")
        yield -1  # Error marker
 
# Exception is caught in stage2
for result in stage2(stage1()):
    print(result)

Comparison: Generators vs Lists vs Iterators vs map/filter

Feature	Generators	Lists	Iterators	map/filter
Memory usage	Minimal (lazy)	Full dataset	Minimal (lazy)	Minimal (lazy)
Creation speed	Instant	Depends on size	Instant	Instant
Reusable	No	Yes	No	No
Indexable	No	Yes	No	No
len() support	No	Yes	No	No
Modification	Read-only	Mutable	Read-only	Read-only
Infinite sequences	Yes	No	Yes	Yes
Syntax	`yield` or `()`	`[]`	`iter()`	`map()`, `filter()`
Best for	Large datasets, pipelines	Small datasets, random access	Protocol implementation	Functional transformations

Example comparison:

# All produce same results but with different characteristics
data = range(1000000)
 
# Generator - memory efficient, not reusable
gen = (x * 2 for x in data)
 
# List - memory intensive, reusable, indexable
lst = [x * 2 for x in data]
 
# map - memory efficient, functional style
mapped = map(lambda x: x * 2, data)
 
# Iterator - explicit protocol implementation
class Doubler:
    def __init__(self, data):
        self.data = iter(data)
 
    def __iter__(self):
        return self
 
    def __next__(self):
        return next(self.data) * 2
 
iterator = Doubler(data)

Experimenting with Generators in Jupyter

When exploring generator patterns and performance characteristics, working in an interactive notebook environment accelerates learning. RunCell (opens in a new tab) brings AI-powered assistance directly into Jupyter notebooks, making it ideal for data scientists experimenting with generator-based data processing pipelines.

With RunCell, you can:

Quickly prototype generator functions and test memory characteristics
Benchmark generator vs list performance with real datasets
Build and debug complex generator pipelines interactively
Get AI suggestions for optimizing generator-based ETL workflows

Here's how you might explore generators in a notebook:

# Cell 1: Define generator pipeline
def read_data():
    for i in range(1000000):
        yield {'id': i, 'value': i * 2}
 
def filter_large(records):
    for record in records:
        if record['value'] > 1000:
            yield record
 
def transform(records):
    for record in records:
        record['squared'] = record['value'] ** 2
        yield record
 
# Cell 2: Execute pipeline and measure
import time
start = time.time()
 
pipeline = transform(filter_large(read_data()))
results = list(itertools.islice(pipeline, 100))  # Take first 100
 
print(f"Time: {time.time() - start:.4f}s")
print(f"Results: {len(results)}")
 
# Cell 3: Visualize with PyGWalker
import pygwalker as pyg
pyg.walk(results)

FAQ

Conclusion

Python generators represent a fundamental shift from eager to lazy evaluation, enabling memory-efficient processing of datasets ranging from thousands to billions of records. By understanding yield, generator expressions, the iterator protocol, and advanced features like send() and yield from, you can build sophisticated data processing pipelines that scale effortlessly.

The key insights to remember:

Generators use lazy evaluation to minimize memory footprint—often 99%+ savings compared to lists
Use generator expressions for simple transformations, generator functions for complex logic
Chain generators to build memory-efficient data processing pipelines
Leverage itertools for powerful generator-based iteration utilities
Choose generators for large datasets and single-pass iteration; choose lists for small datasets requiring random access

Whether you're processing massive log files, streaming API data, or building ETL pipelines, generators provide the performance and memory efficiency needed for production-scale data processing. For CPU-bound workloads, consider combining generators with threading or asyncio to maximize throughput. Master these patterns and you'll write Python code that handles datasets of any size with elegance and efficiency.

Related Guides

Python Asyncio — async/await concurrency for I/O-bound workloads
Python Collections Module — Counter, defaultdict, deque, and namedtuple
Python Threading — multithreading and ThreadPoolExecutor patterns
Python Type Hints — annotating generator return types with Generator[YieldType, SendType, ReturnType]

📚