Skip to content

Python Dataclasses: A Complete Guide to @dataclass Decorator

Updated on

Writing Python classes often involves repetitive boilerplate code. You define __init__ to initialize attributes, __repr__ for readable output, __eq__ for comparisons, and sometimes __hash__ for hashability. This manual implementation becomes tedious for data-holding classes, especially when managing configuration objects, API responses, or database records.

Python 3.7 introduced dataclasses through PEP 557, automating this boilerplate while maintaining the flexibility of regular classes. The @dataclass decorator generates special methods automatically based on type annotations, reducing code from dozens of lines to just a few. This guide demonstrates how to leverage dataclasses for cleaner, more maintainable Python code.

📚

Why Dataclasses Exist: Solving the Boilerplate Problem

Traditional Python classes require explicit method definitions for common operations. Consider this standard class for storing user data:

class User:
    def __init__(self, name, email, age):
        self.name = name
        self.email = email
        self.age = age
 
    def __repr__(self):
        return f"User(name={self.name!r}, email={self.email!r}, age={self.age!r})"
 
    def __eq__(self, other):
        if not isinstance(other, User):
            return NotImplemented
        return (self.name, self.email, self.age) == (other.name, other.email, other.age)

With dataclasses, this reduces to:

from dataclasses import dataclass
 
@dataclass
class User:
    name: str
    email: str
    age: int

The decorator generates __init__, __repr__ (using f-string formatting internally), and __eq__ automatically from type annotations. This eliminates 15+ lines of boilerplate while maintaining identical functionality.

Basic @dataclass Syntax

The simplest dataclass requires only type annotations for fields:

from dataclasses import dataclass
 
@dataclass
class Product:
    name: str
    price: float
    quantity: int
 
product = Product("Laptop", 999.99, 5)
print(product)  # Product(name='Laptop', price=999.99, quantity=5)
 
product2 = Product("Laptop", 999.99, 5)
print(product == product2)  # True

The decorator accepts parameters to customize behavior:

@dataclass(
    init=True,       # Generate __init__ (default: True)
    repr=True,       # Generate __repr__ (default: True)
    eq=True,         # Generate __eq__ (default: True)
    order=False,     # Generate comparison methods (default: False)
    frozen=False,    # Make instances immutable (default: False)
    unsafe_hash=False  # Generate __hash__ (default: False)
)
class Config:
    host: str
    port: int

Field Types and Default Values

Dataclasses support default values for fields. Fields without defaults must appear before fields with defaults:

from dataclasses import dataclass
 
@dataclass
class Server:
    host: str
    port: int = 8080
    protocol: str = "http"
 
server1 = Server("localhost")
print(server1)  # Server(host='localhost', port=8080, protocol='http')
 
server2 = Server("api.example.com", 443, "https")
print(server2)  # Server(host='api.example.com', port=443, protocol='https')

For mutable default values like lists or dictionaries, use default_factory to avoid shared references:

from dataclasses import dataclass, field
 
# WRONG - all instances share the same list
@dataclass
class WrongConfig:
    tags: list = []  # Raises error in Python 3.10+
 
# CORRECT - each instance gets a new list
@dataclass
class CorrectConfig:
    tags: list = field(default_factory=list)
    metadata: dict = field(default_factory=dict)
 
config1 = CorrectConfig()
config2 = CorrectConfig()
 
config1.tags.append("production")
print(config1.tags)  # ['production']
print(config2.tags)  # [] - separate list

The field() Function: Advanced Field Configuration

The field() function provides granular control over individual fields:

from dataclasses import dataclass, field
from typing import List
 
@dataclass
class Employee:
    name: str
    employee_id: int
    salary: float = field(repr=False)  # Hide salary in repr
    skills: List[str] = field(default_factory=list)
    _internal_id: str = field(init=False, repr=False)  # Not in __init__
    performance_score: float = field(default=0.0, compare=False)  # Exclude from comparison
 
    def __post_init__(self):
        self._internal_id = f"EMP_{self.employee_id:06d}"
 
emp = Employee("Alice", 12345, 85000.0, ["Python", "SQL"])
print(emp)  # Employee(name='Alice', employee_id=12345, skills=['Python', 'SQL'], performance_score=0.0)
print(emp._internal_id)  # EMP_012345

Key field() parameters:

ParameterTypeDescription
defaultAnyDefault value for the field
default_factoryCallableZero-argument function returning default value
initboolInclude field in __init__ (default: True)
reprboolInclude field in __repr__ (default: True)
compareboolInclude field in comparison methods (default: True)
hashboolInclude field in __hash__ (default: None)
metadatadictArbitrary metadata (not used by dataclasses module)
kw_onlyboolMake field keyword-only (Python 3.10+)

The metadata parameter stores arbitrary information accessible via fields():

from dataclasses import dataclass, field, fields
 
@dataclass
class APIRequest:
    endpoint: str = field(metadata={"description": "API endpoint path"})
    method: str = field(default="GET", metadata={"choices": ["GET", "POST", "PUT", "DELETE"]})
 
for f in fields(APIRequest):
    print(f"{f.name}: {f.metadata}")
# endpoint: {'description': 'API endpoint path'}
# method: {'choices': ['GET', 'POST', 'PUT', 'DELETE']}

Type Annotations with Dataclasses

Dataclasses rely on type annotations but don't enforce them at runtime. Use typing module for complex types:

from dataclasses import dataclass
from typing import List, Dict, Optional, Union, Tuple
from datetime import datetime
 
@dataclass
class DataAnalysisJob:
    job_id: str
    dataset_path: str
    columns: List[str]
    filters: Dict[str, Union[str, int, float]]
    output_format: str = "csv"
    created_at: datetime = field(default_factory=datetime.now)
    completed_at: Optional[datetime] = None
    error_message: Optional[str] = None
    results: Optional[Dict[str, Tuple[float, float]]] = None
 
job = DataAnalysisJob(
    job_id="job_001",
    dataset_path="/data/sales.csv",
    columns=["date", "revenue", "region"],
    filters={"year": 2026, "region": "US"}
)

For runtime type checking, integrate with libraries like pydantic or use __post_init__ validation.

frozen=True: Creating Immutable Dataclasses

Set frozen=True to make instances immutable after creation, similar to named tuples:

from dataclasses import dataclass
 
@dataclass(frozen=True)
class Point:
    x: float
    y: float
 
    def distance_from_origin(self):
        return (self.x**2 + self.y**2) ** 0.5
 
point = Point(3.0, 4.0)
print(point.distance_from_origin())  # 5.0
 
# Attempting to modify raises FrozenInstanceError
try:
    point.x = 5.0
except AttributeError as e:
    print(f"Error: {e}")  # Error: cannot assign to field 'x'

Frozen dataclasses are hashable by default if all fields are hashable, enabling their use in sets and as dictionary keys:

@dataclass(frozen=True)
class Coordinate:
    latitude: float
    longitude: float
 
locations = {
    Coordinate(40.7128, -74.0060): "New York",
    Coordinate(51.5074, -0.1278): "London"
}
 
print(locations[Coordinate(40.7128, -74.0060)])  # New York

post_init Method: Validation and Computed Fields

The __post_init__ method executes after __init__, allowing validation and computed field initialization:

from dataclasses import dataclass, field
from datetime import datetime
 
@dataclass
class BankAccount:
    account_number: str
    balance: float
    created_at: datetime = field(default_factory=datetime.now)
    account_type: str = field(init=False)
 
    def __post_init__(self):
        if self.balance < 0:
            raise ValueError("Initial balance cannot be negative")
 
        # Compute account_type based on balance
        if self.balance >= 100000:
            self.account_type = "Premium"
        elif self.balance >= 10000:
            self.account_type = "Gold"
        else:
            self.account_type = "Standard"
 
account = BankAccount("ACC123456", 50000.0)
print(account.account_type)  # Gold

For fields with init=False that depend on other fields, use __post_init__:

from dataclasses import dataclass, field
 
@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)
    perimeter: float = field(init=False)
 
    def __post_init__(self):
        self.area = self.width * self.height
        self.perimeter = 2 * (self.width + self.height)
 
rect = Rectangle(5.0, 3.0)
print(f"Area: {rect.area}, Perimeter: {rect.perimeter}")  # Area: 15.0, Perimeter: 16.0

Inheritance with Dataclasses

Dataclasses support inheritance with automatic field merging:

from dataclasses import dataclass
 
@dataclass
class Animal:
    name: str
    age: int
 
@dataclass
class Dog(Animal):
    breed: str
    is_good_boy: bool = True
 
dog = Dog("Buddy", 5, "Golden Retriever")
print(dog)  # Dog(name='Buddy', age=5, breed='Golden Retriever', is_good_boy=True)

Subclasses inherit parent fields and can add new ones. Fields without defaults cannot follow fields with defaults across inheritance:

from dataclasses import dataclass
 
@dataclass
class BaseConfig:
    environment: str = "production"
 
# ERROR: Non-default field 'api_key' cannot follow default field 'environment'
# @dataclass
# class APIConfig(BaseConfig):
#     api_key: str
 
# CORRECT: Use default or rearrange fields
@dataclass
class APIConfig(BaseConfig):
    api_key: str = ""  # Provide default
    timeout: int = 30

Python 3.10+ introduced kw_only to resolve this:

from dataclasses import dataclass
 
@dataclass
class BaseConfig:
    environment: str = "production"
 
@dataclass(kw_only=True)
class APIConfig(BaseConfig):
    api_key: str  # Must be passed as keyword argument
    timeout: int = 30
 
config = APIConfig(api_key="secret_key_123")  # OK
# config = APIConfig("secret_key_123")  # TypeError

slots=True: Memory Efficiency (Python 3.10+)

Python 3.10 added slots=True to define __slots__, reducing memory overhead:

from dataclasses import dataclass
import sys
 
@dataclass
class RegularUser:
    username: str
    email: str
    age: int
 
@dataclass(slots=True)
class SlottedUser:
    username: str
    email: str
    age: int
 
regular = RegularUser("john", "john@example.com", 30)
slotted = SlottedUser("jane", "jane@example.com", 28)
 
print(f"Regular: {sys.getsizeof(regular.__dict__)} bytes")  # ~104 bytes
print(f"Slotted: {sys.getsizeof(slotted)} bytes")          # ~64 bytes

Slotted dataclasses provide 30-40% memory savings and faster attribute access but sacrifice dynamic attribute addition:

regular.new_attribute = "allowed"  # OK
# slotted.new_attribute = "error"  # AttributeError

kw_only=True: Keyword-Only Fields (Python 3.10+)

Force all fields to be keyword-only for clearer instantiation:

from dataclasses import dataclass
 
@dataclass(kw_only=True)
class DatabaseConnection:
    host: str
    port: int
    username: str
    password: str
    database: str = "default"
 
# Must use keyword arguments
conn = DatabaseConnection(
    host="localhost",
    port=5432,
    username="admin",
    password="secret"
)
 
# Positional arguments raise TypeError
# conn = DatabaseConnection("localhost", 5432, "admin", "secret")

Combine kw_only with per-field control:

from dataclasses import dataclass, field
 
@dataclass
class MixedArgs:
    required_positional: str
    optional_positional: int = 0
    required_keyword: str = field(kw_only=True)
    optional_keyword: bool = field(default=False, kw_only=True)
 
obj = MixedArgs("value", 10, required_keyword="kw_value")

Comparison: dataclass vs Alternatives

FeaturedataclassnamedtupleTypedDictPydanticattrs
MutabilityMutable (default)ImmutableN/A (dict subclass)MutableConfigurable
Type validationAnnotations onlyNoAnnotations onlyRuntime validationRuntime validation
Default valuesYesYesNoYesYes
MethodsFull class supportLimitedNoFull class supportFull class support
InheritanceYesNoLimitedYesYes
Memory overheadModerateLowLowHigherModerate
Slots supportYes (3.10+)NoNoYesYes
PerformanceFastFastestFastSlower (validation)Fast
Built-inYes (3.7+)YesYes (3.8+)NoNo

Choose dataclasses for:

  • Standard Python projects without dependencies
  • Simple data containers with type hints
  • When frozen/mutable flexibility is needed
  • Inheritance hierarchies

Choose Pydantic for:

  • API request/response validation
  • Configuration management with strict validation
  • JSON schema generation

Choose namedtuple for:

  • Lightweight immutable containers
  • Maximum memory efficiency
  • Python < 3.7 compatibility

Converting to/from Dictionaries

Dataclasses provide asdict() and astuple() for serialization:

from dataclasses import dataclass, asdict, astuple
 
@dataclass
class Config:
    host: str
    port: int
    ssl_enabled: bool = True
 
config = Config("api.example.com", 443)
 
# Convert to dictionary
config_dict = asdict(config)
print(config_dict)  # {'host': 'api.example.com', 'port': 443, 'ssl_enabled': True}
 
# Convert to tuple
config_tuple = astuple(config)
print(config_tuple)  # ('api.example.com', 443, True)

For nested dataclasses:

from dataclasses import dataclass, asdict
 
@dataclass
class Address:
    street: str
    city: str
    zipcode: str
 
@dataclass
class Person:
    name: str
    address: Address
 
person = Person("Alice", Address("123 Main St", "Springfield", "12345"))
person_dict = asdict(person)
print(person_dict)
# {'name': 'Alice', 'address': {'street': '123 Main St', 'city': 'Springfield', 'zipcode': '12345'}}

Dataclasses with JSON Serialization

Dataclasses don't natively support JSON serialization, but integration is straightforward:

import json
from dataclasses import dataclass, asdict
from datetime import datetime
 
@dataclass
class Event:
    name: str
    timestamp: datetime
    attendees: int
 
    def to_json(self):
        data = asdict(self)
        # Custom serialization for datetime
        data['timestamp'] = self.timestamp.isoformat()
        return json.dumps(data)
 
    @classmethod
    def from_json(cls, json_str):
        data = json.loads(json_str)
        data['timestamp'] = datetime.fromisoformat(data['timestamp'])
        return cls(**data)
 
event = Event("Python Conference", datetime.now(), 500)
json_str = event.to_json()
print(json_str)
 
restored = Event.from_json(json_str)
print(restored)

For complex scenarios, use dataclasses-json library or Pydantic.

Real-World Patterns

Configuration Objects

from dataclasses import dataclass, field
from typing import List
 
@dataclass
class AppConfig:
    app_name: str
    version: str
    debug: bool = False
    allowed_hosts: List[str] = field(default_factory=lambda: ["localhost"])
    database_url: str = "sqlite:///app.db"
    cache_timeout: int = 300
 
    def __post_init__(self):
        if self.debug:
            print(f"Running {self.app_name} v{self.version} in DEBUG mode")
 
config = AppConfig("DataAnalyzer", "2.1.0", debug=True)

API Response Models

from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime
 
@dataclass
class APIResponse:
    status: str
    data: Optional[List[dict]] = None
    error_message: Optional[str] = None
    timestamp: datetime = field(default_factory=datetime.now)
 
    @property
    def is_success(self):
        return self.status == "success"
 
response = APIResponse("success", data=[{"id": 1, "name": "Dataset A"}])
print(response.is_success)  # True

Database Records with PyGWalker Integration

from dataclasses import dataclass, asdict
from typing import List
import pandas as pd
 
@dataclass
class SalesRecord:
    date: str
    product: str
    revenue: float
    region: str
    quantity: int
 
# Create sample data
records = [
    SalesRecord("2026-01-01", "Laptop", 1299.99, "US", 5),
    SalesRecord("2026-01-02", "Mouse", 29.99, "EU", 50),
    SalesRecord("2026-01-03", "Keyboard", 89.99, "US", 20),
]
 
# Convert to DataFrame for visualization with PyGWalker
df = pd.DataFrame([asdict(r) for r in records])
 
# Use PyGWalker for interactive data exploration
# import pygwalker as pyg
# walker = pyg.walk(df)
# This creates a Tableau-like interface to visualize your dataclass-based data

Dataclasses excel at structuring data before visualization. PyGWalker converts DataFrames into interactive visual interfaces, making dataclass-based data analysis workflows seamless.

Performance Benchmarks vs Regular Classes

import timeit
from dataclasses import dataclass
 
# Regular class
class RegularClass:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
 
    def __repr__(self):
        return f"RegularClass(x={self.x}, y={self.y}, z={self.z})"
 
    def __eq__(self, other):
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
 
@dataclass
class DataClass:
    x: int
    y: int
    z: int
 
# Benchmark instantiation
regular_time = timeit.timeit(lambda: RegularClass(1, 2, 3), number=1000000)
dataclass_time = timeit.timeit(lambda: DataClass(1, 2, 3), number=1000000)
 
print(f"Regular class: {regular_time:.4f}s")
print(f"Dataclass: {dataclass_time:.4f}s")
# Dataclasses are typically 5-10% slower due to decorator overhead
# but provide significantly cleaner code

With slots=True (Python 3.10+), dataclasses match or exceed regular class performance while reducing memory usage by 30-40%.

Advanced Patterns: Custom Field Ordering

Dataclasses with order=True integrate seamlessly with Python's sorting mechanisms:

from dataclasses import dataclass, field
 
def sort_by_priority(items):
    return sorted(items, key=lambda x: x.priority, reverse=True)
 
@dataclass(order=True)
class Task:
    priority: int
    name: str = field(compare=False)
    description: str = field(compare=False)
 
tasks = [
    Task(3, "Review PR", "Code review for feature X"),
    Task(1, "Write docs", "Documentation update"),
    Task(5, "Fix bug", "Critical production issue"),
]
 
sorted_tasks = sorted(tasks)
for task in sorted_tasks:
    print(f"Priority {task.priority}: {task.name}")
# Priority 1: Write docs
# Priority 3: Review PR
# Priority 5: Fix bug

Best Practices and Gotchas

  1. Always use default_factory for mutable defaults: Never assign [] or {} directly
  2. Type hints are required: Dataclasses rely on annotations, not values
  3. Field order matters: Non-default fields before default fields
  4. frozen=True for immutable data: Use for hashable objects and thread safety
  5. Use __post_init__ sparingly: Excessive logic defeats dataclass simplicity
  6. Consider slots=True for large datasets: Significant memory savings in Python 3.10+
  7. Validate in __post_init__: Dataclasses don't enforce types at runtime

FAQ

Conclusion

Python dataclasses eliminate boilerplate code while preserving the full power of classes. The @dataclass decorator automatically generates initialization, representation, and comparison methods, reducing development time and maintenance burden. From configuration objects to API models and database records, dataclasses provide a clean, type-annotated approach to data-holding classes.

Key advantages include automatic method generation, customizable field behavior through field(), immutability with frozen=True, validation via __post_init__, and memory efficiency with slots=True. While alternatives like namedtuples and Pydantic serve specific use cases, dataclasses strike an optimal balance between simplicity and functionality for most Python projects.

For data analysis workflows, combining dataclasses with tools like PyGWalker creates powerful pipelines where structured data models feed directly into interactive visualizations, streamlining everything from data ingestion to insight generation.

Related Guides

📚