Python defaultdict: Simplify Dictionary Operations with Default Values

Q: What is defaultdict in Python?

defaultdict is a dictionary subclass in collections that provides a default value for missing keys. Instead of raising KeyError, it calls a factory function (like int, list, or set) to create and store a default value automatically.

Q: When should I use defaultdict(list) vs defaultdict(set)?

Use defaultdict(list) when you want to group items and preserve duplicates and insertion order. Use defaultdict(set) when you want to collect only unique items per key.

Q: Can I serialize a defaultdict to JSON?

Yes, but for nested defaultdict objects, convert them to regular dict first using a recursive conversion function. You can also set default_factory = None to prevent accidental key creation before serialization.

Q: How do I create a nested defaultdict?

Define a recursive factory function: def tree(): return defaultdict(tree). For simpler two-level nesting, use defaultdict(lambda: defaultdict(int)).

Name: Soren Atelier

Updated on 2/10/2026

Every Python developer has hit this wall: you write a clean loop to group or count items using a dictionary, run the code, and a KeyError crashes the whole script because one key didn't exist yet. The standard workaround is to sprinkle if key in dict checks or try/except KeyError blocks everywhere. Your logic for grouping ten lines of data suddenly balloons to twenty lines of defensive boilerplate.

This gets worse at scale. When you're building adjacency lists for graphs, aggregating log data, or counting word frequencies across millions of records, those guard clauses add up. They slow you down as a developer, make the code harder to review, and introduce subtle bugs when you forget a check in one branch.

Python's collections.defaultdict eliminates this entire category of problems. It is a dictionary subclass that calls a factory function to supply missing values automatically. No more KeyError, no more guard clauses, no more boilerplate.

What is defaultdict?

The defaultdict is a subclass of Python's built-in dict. The key difference: when you access a key that doesn't exist, defaultdict automatically creates it with a default value instead of raising KeyError.

from collections import defaultdict
 
# Regular dict raises KeyError
regular = {}
# regular['missing']  # KeyError: 'missing'
 
# defaultdict creates the value automatically
dd = defaultdict(int)
dd['missing']  # Returns 0, and now 'missing' is a key
print(dd)  # defaultdict(<class 'int'>, {'missing': 0})

The constructor takes a factory function as its first argument. Common factories:

int -- returns 0
list -- returns []
set -- returns set()
str -- returns ""
lambda: value -- returns any custom default

defaultdict(int) -- The Counting Pattern

The most common use. Every new key starts at 0, so you can increment immediately.

from collections import defaultdict
 
words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
 
# Without defaultdict
counts_regular = {}
for word in words:
    if word in counts_regular:
        counts_regular[word] += 1
    else:
        counts_regular[word] = 1
 
# With defaultdict(int) -- clean and direct
counts = defaultdict(int)
for word in words:
    counts[word] += 1
 
print(dict(counts))
# {'apple': 3, 'banana': 2, 'cherry': 1}

defaultdict(list) -- The Grouping Pattern

Group related items together. Each new key starts with an empty list.

from collections import defaultdict
 
students = [
    ('Math', 'Alice'),
    ('Science', 'Bob'),
    ('Math', 'Charlie'),
    ('Science', 'Diana'),
    ('Math', 'Eve'),
    ('History', 'Frank'),
]
 
groups = defaultdict(list)
for subject, student in students:
    groups[subject].append(student)
 
for subject, names in groups.items():
    print(f"{subject}: {', '.join(names)}")
 
# Math: Alice, Charlie, Eve
# Science: Bob, Diana
# History: Frank

Group Records by Multiple Fields

from collections import defaultdict
 
sales = [
    {'region': 'East', 'product': 'Widget', 'amount': 100},
    {'region': 'West', 'product': 'Gadget', 'amount': 200},
    {'region': 'East', 'product': 'Widget', 'amount': 150},
    {'region': 'West', 'product': 'Widget', 'amount': 300},
]
 
by_region_product = defaultdict(list)
for sale in sales:
    key = (sale['region'], sale['product'])
    by_region_product[key].append(sale['amount'])
 
for (region, product), amounts in by_region_product.items():
    total = sum(amounts)
    print(f"{region} - {product}: {amounts} (total: {total})")

defaultdict(set) -- Unique Grouping

Collect unique values per key automatically.

from collections import defaultdict
 
edges = [
    ('Alice', 'Bob'), ('Alice', 'Charlie'),
    ('Bob', 'Alice'), ('Bob', 'Diana'),
    ('Alice', 'Bob'),  # duplicate
]
 
connections = defaultdict(set)
for person, friend in edges:
    connections[person].add(friend)
 
for person, friends in connections.items():
    print(f"{person} is connected to: {friends}")
# Alice is connected to: {'Bob', 'Charlie'}
# Bob is connected to: {'Alice', 'Diana'}

defaultdict(lambda: value) -- Custom Defaults

When built-in types don't fit, use a lambda to return any default value.

from collections import defaultdict
 
# Default value of 'N/A' for missing entries
status = defaultdict(lambda: 'N/A')
status['server1'] = 'running'
status['server2'] = 'stopped'
print(status['server3'])   # N/A
 
# Default starting balance
accounts = defaultdict(lambda: 100.0)
accounts['alice'] += 50
accounts['bob'] -= 30
print(dict(accounts))  # {'alice': 150.0, 'bob': 70.0}

Default Dictionary with Structured Values

from collections import defaultdict
 
def default_profile():
    return {'score': 0, 'level': 1, 'items': []}
 
profiles = defaultdict(default_profile)
profiles['player1']['score'] += 100
profiles['player1']['items'].append('sword')
profiles['player2']['level'] = 5
 
print(profiles['player1'])
# {'score': 100, 'level': 1, 'items': ['sword']}
print(profiles['player3'])
# {'score': 0, 'level': 1, 'items': []}

Nested defaultdict -- Tree Structures

One of the most powerful patterns is using defaultdict recursively to create auto-vivifying dictionaries.

from collections import defaultdict
 
def tree():
    return defaultdict(tree)
 
taxonomy = tree()
taxonomy['Animal']['Mammal']['Dog'] = 'Canis lupus familiaris'
taxonomy['Animal']['Mammal']['Cat'] = 'Felis catus'
taxonomy['Animal']['Bird']['Eagle'] = 'Aquila chrysaetos'
taxonomy['Plant']['Tree']['Oak'] = 'Quercus'
 
print(taxonomy['Animal']['Mammal']['Dog'])  # Canis lupus familiaris

Multi-Level Aggregation

from collections import defaultdict
 
sales_data = [
    (2025, 'Q1', 'Widget', 500),
    (2025, 'Q1', 'Gadget', 300),
    (2025, 'Q2', 'Widget', 700),
    (2026, 'Q1', 'Widget', 600),
]
 
report = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
for year, quarter, product, amount in sales_data:
    report[year][quarter][product] += amount
 
print(report[2025]['Q1']['Widget'])  # 500
print(report[2026]['Q1']['Widget'])  # 600

defaultdict vs dict.setdefault() vs get() -- Comparison

Feature	`defaultdict`	`dict.setdefault()`	`dict.get()`
Import required	Yes (`collections`)	No	No
Auto-creates key	Yes	Yes	No
Modifies dict on access	Yes	Yes	No
Custom default per call	No (global factory)	Yes	Yes
Performance (repeated)	Fastest	Slower (method call overhead)	Fastest (no mutation)
Best for	Repeated accumulation	One-off defaults	Read-only fallback

When to use each:

defaultdict: building up values over many iterations (counting, grouping)
dict.setdefault(): occasionally need a default for a specific key
dict.get(): read a value with a fallback without modifying the dictionary

Converting defaultdict Back to Regular dict

from collections import defaultdict
import json
 
def defaultdict_to_dict(d):
    """Recursively convert defaultdict to regular dict."""
    if isinstance(d, defaultdict):
        d = {k: defaultdict_to_dict(v) for k, v in d.items()}
    return d
 
nested = defaultdict(lambda: defaultdict(int))
nested['x']['y'] = 10
nested['a']['b'] = 20
 
regular = defaultdict_to_dict(nested)
print(json.dumps(regular))  # {"x": {"y": 10}, "a": {"b": 20}}

You can also disable the default factory by setting it to None:

dd = defaultdict(int)
dd['a'] += 1
dd.default_factory = None
# dd['missing']  # Now raises KeyError

Practical Examples

Adjacency List for Graphs

from collections import defaultdict, deque
 
edges = [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')]
 
graph = defaultdict(list)
for src, dst in edges:
    graph[src].append(dst)
    graph[dst].append(src)  # undirected graph
 
def bfs(graph, start):
    visited = set()
    queue = deque([start])
    order = []
    while queue:
        node = queue.popleft()
        if node not in visited:
            visited.add(node)
            order.append(node)
            queue.extend(graph[node])
    return order
 
print(bfs(graph, 'A'))  # ['A', 'B', 'C', 'D', 'E']

Inverted Index for Text Search

from collections import defaultdict
 
documents = {
    'doc1': 'python is a great programming language',
    'doc2': 'data science uses python extensively',
    'doc3': 'machine learning with python and data',
}
 
index = defaultdict(set)
for doc_id, text in documents.items():
    for word in text.split():
        index[word.lower()].add(doc_id)
 
def search(query):
    return index.get(query.lower(), set())
 
print(search('python'))  # {'doc1', 'doc2', 'doc3'}
print(search('data'))    # {'doc2', 'doc3'}

Visualizing Grouped Data with PyGWalker

After grouping and aggregating data with defaultdict, you often want to visualize the results. PyGWalker (opens in a new tab) turns your pandas DataFrame into an interactive visualization interface directly in Jupyter:

from collections import defaultdict
import pandas as pd
import pygwalker as pyg
 
sales = [
    ('Electronics', 'Laptop', 1200),
    ('Electronics', 'Phone', 800),
    ('Clothing', 'Shirt', 45),
    ('Clothing', 'Jacket', 120),
]
 
totals = defaultdict(lambda: defaultdict(int))
for category, product, amount in sales:
    totals[category][product] += amount
 
rows = []
for category, products in totals.items():
    for product, total in products.items():
        rows.append({'category': category, 'product': product, 'total': total})
 
df = pd.DataFrame(rows)
walker = pyg.walk(df)

FAQ

What is defaultdict in Python?

defaultdict is a dictionary subclass in collections that provides a default value for missing keys. Instead of raising KeyError, it calls a factory function (like int, list, or set) to create and store a default value automatically.

What is the difference between dict and defaultdict?

The only functional difference is how they handle missing keys. A regular dict raises KeyError. A defaultdict calls its default_factory function to create a default value. In all other respects they behave identically.

When should I use defaultdict(list) vs defaultdict(set)?

Use defaultdict(list) when you want to group items and preserve duplicates and insertion order. Use defaultdict(set) when you want to collect only unique items per key.

Can I serialize a defaultdict to JSON?

Yes, but for nested defaultdict objects, convert them to regular dict first using a recursive conversion function. You can also set default_factory = None to prevent accidental key creation before serialization.

How do I create a nested defaultdict?

Define a recursive factory function: def tree(): return defaultdict(tree). For simpler two-level nesting, use defaultdict(lambda: defaultdict(int)).

Conclusion

Python's collections.defaultdict is one of the most practical tools in the standard library. It turns verbose, error-prone dictionary accumulation patterns into clean one-liners. Use defaultdict(int) for counting, defaultdict(list) for grouping, defaultdict(set) for unique collection, and nested defaultdict for hierarchical data.

The key takeaway: if you find yourself writing if key not in dict before every dictionary operation, replace that dictionary with a defaultdict. Your code will be shorter, faster, and far easier to maintain.

📚