Python defaultdict: Simplify Dictionary Operations with Default Values
Updated on
Every Python developer has hit this wall: you write a clean loop to group or count items using a dictionary, run the code, and a KeyError crashes the whole script because one key didn't exist yet. The standard workaround is to sprinkle if key in dict checks or try/except KeyError blocks everywhere. Your logic for grouping ten lines of data suddenly balloons to twenty lines of defensive boilerplate.
This gets worse at scale. When you're building adjacency lists for graphs, aggregating log data, or counting word frequencies across millions of records, those guard clauses add up. They slow you down as a developer, make the code harder to review, and introduce subtle bugs when you forget a check in one branch.
Python's collections.defaultdict eliminates this entire category of problems. It is a dictionary subclass that calls a factory function to supply missing values automatically. No more KeyError, no more guard clauses, no more boilerplate.
What is defaultdict?
The defaultdict is a subclass of Python's built-in dict. The key difference: when you access a key that doesn't exist, defaultdict automatically creates it with a default value instead of raising KeyError.
from collections import defaultdict
# Regular dict raises KeyError
regular = {}
# regular['missing'] # KeyError: 'missing'
# defaultdict creates the value automatically
dd = defaultdict(int)
dd['missing'] # Returns 0, and now 'missing' is a key
print(dd) # defaultdict(<class 'int'>, {'missing': 0})The constructor takes a factory function as its first argument. Common factories:
int-- returns0list-- returns[]set-- returnsset()str-- returns""lambda: value-- returns any custom default
defaultdict(int) -- The Counting Pattern
The most common use. Every new key starts at 0, so you can increment immediately.
from collections import defaultdict
words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
# Without defaultdict
counts_regular = {}
for word in words:
if word in counts_regular:
counts_regular[word] += 1
else:
counts_regular[word] = 1
# With defaultdict(int) -- clean and direct
counts = defaultdict(int)
for word in words:
counts[word] += 1
print(dict(counts))
# {'apple': 3, 'banana': 2, 'cherry': 1}defaultdict(list) -- The Grouping Pattern
Group related items together. Each new key starts with an empty list.
from collections import defaultdict
students = [
('Math', 'Alice'),
('Science', 'Bob'),
('Math', 'Charlie'),
('Science', 'Diana'),
('Math', 'Eve'),
('History', 'Frank'),
]
groups = defaultdict(list)
for subject, student in students:
groups[subject].append(student)
for subject, names in groups.items():
print(f"{subject}: {', '.join(names)}")
# Math: Alice, Charlie, Eve
# Science: Bob, Diana
# History: FrankGroup Records by Multiple Fields
from collections import defaultdict
sales = [
{'region': 'East', 'product': 'Widget', 'amount': 100},
{'region': 'West', 'product': 'Gadget', 'amount': 200},
{'region': 'East', 'product': 'Widget', 'amount': 150},
{'region': 'West', 'product': 'Widget', 'amount': 300},
]
by_region_product = defaultdict(list)
for sale in sales:
key = (sale['region'], sale['product'])
by_region_product[key].append(sale['amount'])
for (region, product), amounts in by_region_product.items():
total = sum(amounts)
print(f"{region} - {product}: {amounts} (total: {total})")defaultdict(set) -- Unique Grouping
Collect unique values per key automatically.
from collections import defaultdict
edges = [
('Alice', 'Bob'), ('Alice', 'Charlie'),
('Bob', 'Alice'), ('Bob', 'Diana'),
('Alice', 'Bob'), # duplicate
]
connections = defaultdict(set)
for person, friend in edges:
connections[person].add(friend)
for person, friends in connections.items():
print(f"{person} is connected to: {friends}")
# Alice is connected to: {'Bob', 'Charlie'}
# Bob is connected to: {'Alice', 'Diana'}defaultdict(lambda: value) -- Custom Defaults
When built-in types don't fit, use a lambda to return any default value.
from collections import defaultdict
# Default value of 'N/A' for missing entries
status = defaultdict(lambda: 'N/A')
status['server1'] = 'running'
status['server2'] = 'stopped'
print(status['server3']) # N/A
# Default starting balance
accounts = defaultdict(lambda: 100.0)
accounts['alice'] += 50
accounts['bob'] -= 30
print(dict(accounts)) # {'alice': 150.0, 'bob': 70.0}Default Dictionary with Structured Values
from collections import defaultdict
def default_profile():
return {'score': 0, 'level': 1, 'items': []}
profiles = defaultdict(default_profile)
profiles['player1']['score'] += 100
profiles['player1']['items'].append('sword')
profiles['player2']['level'] = 5
print(profiles['player1'])
# {'score': 100, 'level': 1, 'items': ['sword']}
print(profiles['player3'])
# {'score': 0, 'level': 1, 'items': []}Nested defaultdict -- Tree Structures
One of the most powerful patterns is using defaultdict recursively to create auto-vivifying dictionaries.
from collections import defaultdict
def tree():
return defaultdict(tree)
taxonomy = tree()
taxonomy['Animal']['Mammal']['Dog'] = 'Canis lupus familiaris'
taxonomy['Animal']['Mammal']['Cat'] = 'Felis catus'
taxonomy['Animal']['Bird']['Eagle'] = 'Aquila chrysaetos'
taxonomy['Plant']['Tree']['Oak'] = 'Quercus'
print(taxonomy['Animal']['Mammal']['Dog']) # Canis lupus familiarisMulti-Level Aggregation
from collections import defaultdict
sales_data = [
(2025, 'Q1', 'Widget', 500),
(2025, 'Q1', 'Gadget', 300),
(2025, 'Q2', 'Widget', 700),
(2026, 'Q1', 'Widget', 600),
]
report = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
for year, quarter, product, amount in sales_data:
report[year][quarter][product] += amount
print(report[2025]['Q1']['Widget']) # 500
print(report[2026]['Q1']['Widget']) # 600defaultdict vs dict.setdefault() vs get() -- Comparison
| Feature | defaultdict | dict.setdefault() | dict.get() |
|---|---|---|---|
| Import required | Yes (collections) | No | No |
| Auto-creates key | Yes | Yes | No |
| Modifies dict on access | Yes | Yes | No |
| Custom default per call | No (global factory) | Yes | Yes |
| Performance (repeated) | Fastest | Slower (method call overhead) | Fastest (no mutation) |
| Best for | Repeated accumulation | One-off defaults | Read-only fallback |
When to use each:
defaultdict: building up values over many iterations (counting, grouping)dict.setdefault(): occasionally need a default for a specific keydict.get(): read a value with a fallback without modifying the dictionary
Converting defaultdict Back to Regular dict
from collections import defaultdict
import json
def defaultdict_to_dict(d):
"""Recursively convert defaultdict to regular dict."""
if isinstance(d, defaultdict):
d = {k: defaultdict_to_dict(v) for k, v in d.items()}
return d
nested = defaultdict(lambda: defaultdict(int))
nested['x']['y'] = 10
nested['a']['b'] = 20
regular = defaultdict_to_dict(nested)
print(json.dumps(regular)) # {"x": {"y": 10}, "a": {"b": 20}}You can also disable the default factory by setting it to None:
dd = defaultdict(int)
dd['a'] += 1
dd.default_factory = None
# dd['missing'] # Now raises KeyErrorPractical Examples
Adjacency List for Graphs
from collections import defaultdict, deque
edges = [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')]
graph = defaultdict(list)
for src, dst in edges:
graph[src].append(dst)
graph[dst].append(src) # undirected graph
def bfs(graph, start):
visited = set()
queue = deque([start])
order = []
while queue:
node = queue.popleft()
if node not in visited:
visited.add(node)
order.append(node)
queue.extend(graph[node])
return order
print(bfs(graph, 'A')) # ['A', 'B', 'C', 'D', 'E']Inverted Index for Text Search
from collections import defaultdict
documents = {
'doc1': 'python is a great programming language',
'doc2': 'data science uses python extensively',
'doc3': 'machine learning with python and data',
}
index = defaultdict(set)
for doc_id, text in documents.items():
for word in text.split():
index[word.lower()].add(doc_id)
def search(query):
return index.get(query.lower(), set())
print(search('python')) # {'doc1', 'doc2', 'doc3'}
print(search('data')) # {'doc2', 'doc3'}Visualizing Grouped Data with PyGWalker
After grouping and aggregating data with defaultdict, you often want to visualize the results. PyGWalker (opens in a new tab) turns your pandas DataFrame into an interactive visualization interface directly in Jupyter:
from collections import defaultdict
import pandas as pd
import pygwalker as pyg
sales = [
('Electronics', 'Laptop', 1200),
('Electronics', 'Phone', 800),
('Clothing', 'Shirt', 45),
('Clothing', 'Jacket', 120),
]
totals = defaultdict(lambda: defaultdict(int))
for category, product, amount in sales:
totals[category][product] += amount
rows = []
for category, products in totals.items():
for product, total in products.items():
rows.append({'category': category, 'product': product, 'total': total})
df = pd.DataFrame(rows)
walker = pyg.walk(df)FAQ
What is defaultdict in Python?
defaultdict is a dictionary subclass in collections that provides a default value for missing keys. Instead of raising KeyError, it calls a factory function (like int, list, or set) to create and store a default value automatically.
What is the difference between dict and defaultdict?
The only functional difference is how they handle missing keys. A regular dict raises KeyError. A defaultdict calls its default_factory function to create a default value. In all other respects they behave identically.
When should I use defaultdict(list) vs defaultdict(set)?
Use defaultdict(list) when you want to group items and preserve duplicates and insertion order. Use defaultdict(set) when you want to collect only unique items per key.
Can I serialize a defaultdict to JSON?
Yes, but for nested defaultdict objects, convert them to regular dict first using a recursive conversion function. You can also set default_factory = None to prevent accidental key creation before serialization.
How do I create a nested defaultdict?
Define a recursive factory function: def tree(): return defaultdict(tree). For simpler two-level nesting, use defaultdict(lambda: defaultdict(int)).
Conclusion
Python's collections.defaultdict is one of the most practical tools in the standard library. It turns verbose, error-prone dictionary accumulation patterns into clean one-liners. Use defaultdict(int) for counting, defaultdict(list) for grouping, defaultdict(set) for unique collection, and nested defaultdict for hierarchical data.
The key takeaway: if you find yourself writing if key not in dict before every dictionary operation, replace that dictionary with a defaultdict. Your code will be shorter, faster, and far easier to maintain.