Set Operations Complexity¶
The set type is an unordered collection of unique items. It's implemented as a hash table similar to dictionaries in CPython.
Time Complexity¶
| Operation | Time | Notes |
|---|---|---|
len() |
O(1) | Direct count |
add(x) |
O(1) avg, O(n) worst | Amortized; hash collisions cause O(n) |
remove(x) |
O(1) avg, O(n) worst | Hash lookup + delete |
discard(x) |
O(1) avg, O(n) worst | Hash lookup + delete |
pop() |
O(1) avg | Remove arbitrary element |
clear() |
O(n) | Deallocate all |
x in set |
O(1) avg, O(n) worst | Hash lookup; collisions cause O(n) |
copy() |
O(n) | Shallow copy |
union(other) |
O(n+m) | n, m = set lengths |
intersection(other) |
O(min(n,m)) | Iterate smaller set |
difference(other) |
O(n) | n = set length |
symmetric_diff(other) |
O(n+m) | Combined set ops |
issubset() |
O(n) | Check all elements |
issuperset() |
O(m) | m = other length |
isdisjoint() |
O(min(n,m)) | Early termination |
Space Complexity¶
| Operation | Space |
|---|---|
add() |
O(1) amortized, O(n) worst |
copy() |
O(n) |
union() |
O(n+m) for result |
intersection() |
O(min(n,m)) for result |
difference() |
O(n) for result |
Implementation Details¶
Hash Table Implementation¶
Sets use the same hash table design as dictionaries, but:
- Only stores keys (no values)
- More memory efficient than dict
- Same O(1) average case lookup
Set Operations¶
# Union: combines both sets
{1, 2} | {2, 3} # {1, 2, 3} - O(len(s1) + len(s2))
# Intersection: common elements
{1, 2, 3} & {2, 3, 4} # {2, 3} - O(min(len(s1), len(s2)))
# Difference: elements in first but not second
{1, 2, 3} - {2, 4} # {1, 3} - O(len(s1))
# Symmetric difference: elements in either but not both
{1, 2} ^ {2, 3} # {1, 3} - O(len(s1) + len(s2))
Membership Testing¶
# Very fast - O(1) hash lookup
s = {1, 2, 3, 4, 5}
if 3 in s: # O(1), not O(n)
pass
Comparison with Lists¶
# List membership: O(n) - must scan entire list
numbers_list = [1, 2, 3, 4, 5]
3 in numbers_list # O(n)
# Set membership: O(1) - hash lookup
numbers_set = {1, 2, 3, 4, 5}
3 in numbers_set # O(1) - much faster for large collections!
Version Notes¶
- All Python 3 versions: Core complexity unchanged
- Python 3.9+: New set union/intersection operators
Implementation Comparison¶
CPython¶
Standard hash table implementation.
PyPy¶
JIT compilation may provide additional optimization.
Jython¶
Underlying Java HashSet, same O(1) characteristics.
Best Practices¶
✅ Do:
- Use sets for membership testing with large collections
- Use set operations (
|,&,-,^) for combining sets - Use sets to remove duplicates:
set(list_with_dups) - Use
frozensetfor hashable unique items
❌ Avoid:
- Using lists for frequent membership checks
- Relying on set order (not guaranteed)
- Unhashable types (lists, dicts) in sets
Common Patterns¶
Remove Duplicates¶
# Bad: preserves list, but O(n²)
unique = []
for item in items:
if item not in unique:
unique.append(item)
# Good: O(n), but loses order
unique = list(set(items))
# Best: O(n) and preserves order (Python 3.7+)
unique = list(dict.fromkeys(items))
Fast Filtering¶
# Bad: O(n*m) - checks membership in list for each element
large_list = list(range(1000000))
exclusions = [1, 2, 3, ...]
filtered = [x for x in large_list if x not in exclusions]
# Good: O(n) - fast set lookup
exclusions_set = set(exclusions)
filtered = [x for x in large_list if x not in exclusions_set]