dataclasses Module Complexity¶
The dataclasses module provides a decorator to automatically generate special methods for classes that are primarily used to store data.
Complexity Reference¶
| Operation | Time | Space | Notes |
|---|---|---|---|
@dataclass decorator |
O(n) | O(n) | n = number of fields |
| Field access | O(1) | O(1) | Direct attribute access |
__init__() |
O(n) | O(n) | n = number of fields |
__repr__() |
O(n) | O(n) | n = number of fields |
__eq__() |
O(n) | O(1) | n = number of fields |
replace() |
O(n) | O(n) | Creates new instance |
asdict() |
O(n*m) | O(n*m) | n = fields, m = nesting depth; recursive |
astuple() |
O(n*m) | O(n*m) | n = fields, m = nesting depth; recursive |
Basic Dataclass¶
Simple Definition¶
from dataclasses import dataclass
# Define dataclass - O(n) for n fields
@dataclass
class Point:
x: float
y: float
# Create instance - O(n)
p = Point(x=1.0, y=2.0)
# Access fields - O(1)
print(p.x, p.y) # 1.0 2.0
# Automatic __repr__ - O(n)
print(p) # Point(x=1.0, y=2.0)
# Automatic __eq__ - O(n)
p2 = Point(1.0, 2.0)
print(p == p2) # True
Default Values¶
from dataclasses import dataclass
@dataclass
class Config:
name: str
timeout: int = 30
retries: int = 3
tags: list = None # Mutable default
# Create with defaults - O(n)
config1 = Config("api") # O(1)
config2 = Config("service", timeout=60) # O(1)
# Access - O(1)
print(config1.timeout) # 30
print(config2.timeout) # 60
Default Factory¶
from dataclasses import dataclass, field
@dataclass
class Task:
name: str
tags: list = field(default_factory=list)
metadata: dict = field(default_factory=dict)
# Create with separate instances - O(n)
task1 = Task("task1")
task2 = Task("task2")
# Different lists for each - O(1) per instance
task1.tags.append("urgent")
print(task1.tags) # ['urgent']
print(task2.tags) # [] - separate list
Dataclass Features¶
Optional Fields¶
from dataclasses import dataclass
from typing import Optional
@dataclass
class Person:
name: str
age: int
email: Optional[str] = None
# Create - O(n)
person1 = Person("Alice", 30)
person2 = Person("Bob", 25, "bob@example.com")
# Access - O(1)
print(person1.email) # None
print(person2.email) # bob@example.com
Frozen Dataclasses¶
from dataclasses import dataclass
# Immutable dataclass - O(n)
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
# Access - O(1)
print(p.x)
# Modification raises error - O(1) check
try:
p.x = 3.0
except FrozenInstanceError:
print("Cannot modify frozen dataclass")
Post-Init Processing¶
from dataclasses import dataclass
@dataclass
class Circle:
radius: float
# Called after __init__ - O(1)
def __post_init__(self):
if self.radius <= 0:
raise ValueError("Radius must be positive")
# Method - O(1)
def area(self):
import math
return math.pi * self.radius ** 2
# Create - O(n) including validation
circle = Circle(5.0)
# Use method - O(1)
print(circle.area()) # 78.5...
Dataclass Utilities¶
Replace - Create Modified Copy¶
from dataclasses import dataclass, replace
@dataclass
class User:
username: str
email: str
active: bool = True
# Create - O(n)
user1 = User("alice", "alice@example.com")
# Replace fields - O(n) creates new instance
user2 = replace(user1, email="alice.new@example.com")
# Original unchanged - O(1)
print(user1.email) # alice@example.com
print(user2.email) # alice.new@example.com
Convert to Dictionary¶
from dataclasses import dataclass, asdict
@dataclass
class Book:
title: str
author: str
pages: int
# Create - O(n)
book = Book("Python Guide", "John Doe", 350)
# Convert to dict - O(n + m) where m = nested depth
book_dict = asdict(book)
# {'title': 'Python Guide', 'author': 'John Doe', 'pages': 350}
# Handle nested dataclasses - O(n*m)
@dataclass
class Library:
name: str
books: list # List of Book objects
library = Library("Central", [book, book])
lib_dict = asdict(library) # Recursively converts nested dataclasses
Convert to Tuple¶
from dataclasses import dataclass, astuple
@dataclass
class Point:
x: float
y: float
z: float
# Create - O(n)
p = Point(1.0, 2.0, 3.0)
# Convert to tuple - O(n)
p_tuple = astuple(p) # (1.0, 2.0, 3.0)
# Handles nested - O(n*m)
@dataclass
class Polygon:
points: list # List of Point objects
poly = Polygon([p, p, p])
poly_tuple = astuple(poly) # Recursively converts
Comparison and Hashing¶
Equality Comparison¶
from dataclasses import dataclass
@dataclass
class Product:
sku: str
price: float
# Create instances - O(n)
prod1 = Product("ABC123", 29.99)
prod2 = Product("ABC123", 29.99)
prod3 = Product("XYZ789", 15.99)
# Equality - O(n) compares all fields
print(prod1 == prod2) # True
print(prod1 == prod3) # False
Order Comparison¶
from dataclasses import dataclass
@dataclass(order=True)
class Student:
name: str
grade: float
# Create instances - O(n)
s1 = Student("Alice", 3.8)
s2 = Student("Bob", 3.5)
# Comparisons enabled - O(n) for all fields
print(s1 > s2) # True (3.8 > 3.5)
print(s1 < s2) # False
print(s1 >= s2) # True
Hashable Dataclasses¶
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinates:
x: int
y: int
# Create - O(n)
c1 = Coordinates(1, 2)
c2 = Coordinates(1, 2)
# Use in set - O(1) per operation
locations = {c1, c2} # Set deduplicates
print(len(locations)) # 1
# Use as dict key - O(1) per operation
location_names = {c1: "start", c2: "origin"}
print(len(location_names)) # 1
Advanced Patterns¶
Inheritance¶
from dataclasses import dataclass
@dataclass
class Animal:
name: str
age: int
@dataclass
class Dog(Animal):
breed: str
# Create - O(n)
dog = Dog("Buddy", 3, "Golden Retriever")
# Access all fields - O(1)
print(dog.name, dog.age, dog.breed) # Buddy 3 Golden Retriever
Init-Only Fields¶
from dataclasses import dataclass, field
@dataclass
class Temperature:
celsius: float
fahrenheit: float = field(init=False)
def __post_init__(self):
self.fahrenheit = self.celsius * 9/5 + 32
# Create - O(n)
temp = Temperature(25.0)
# Access - O(1)
print(temp.celsius) # 25.0
print(temp.fahrenheit) # 77.0
Class Variables¶
from dataclasses import dataclass
@dataclass
class Counter:
count: int
instances: int = 0 # Class variable, not a field
def __post_init__(self):
Counter.instances += 1
# Create - O(n)
c1 = Counter(1)
c2 = Counter(2)
# Access class variable - O(1)
print(Counter.instances) # 2
Comparison: Dataclass vs Alternatives¶
vs Regular Class¶
# Regular class
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"Point(x={self.x}, y={self.y})"
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# Dataclass - much cleaner, same performance
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
vs NamedTuple¶
from collections import namedtuple
from dataclasses import dataclass
# NamedTuple - immutable, smaller memory
Point_NT = namedtuple('Point', ['x', 'y'])
# Dataclass - mutable, more flexible
@dataclass
class Point_DC:
x: float
y: float
# Performance: nearly identical
# Memory: namedtuple slightly smaller
# Flexibility: dataclass wins
vs TypedDict¶
from typing import TypedDict
from dataclasses import dataclass
# TypedDict - runtime type info, dict-like
class PointDict(TypedDict):
x: float
y: float
# Dataclass - object with attributes
@dataclass
class Point:
x: float
y: float
# Use cases:
# TypedDict: interop with JSON, dicts
# Dataclass: internal data structures
Performance Notes¶
Time Complexity¶
- Decorator: O(n) where n = number of fields
- Field access: O(1) (standard attribute lookup)
- Comparison: O(n) (all fields compared)
- Conversion (asdict/astuple): O(n*m) where m = nesting depth
Space Complexity¶
- Instance: O(n) for n fields
- Replace/asdict: O(n) for copy
- Inheritance: O(n) total fields from all classes
CPython Implementation¶
- Uses
__slots__for memory efficiency (optional) - Caches method generation
- Zero runtime overhead vs hand-written classes
Best Practices¶
Use Dataclasses For¶
- Data containers with multiple fields
- Configuration objects
- API request/response models
- Simple immutable records (with frozen=True)
from dataclasses import dataclass
@dataclass
class ApiRequest:
method: str
url: str
headers: dict = None
body: str = None
Avoid When¶
- Need complex validation (use init)
- Frequent dynamic attributes
- Very simple single-field objects
- Performance critical lookups