Skip to content

hash() Function Complexity

The hash() function returns the hash value of an object, which is used by dictionaries and sets for fast lookups.

Complexity Reference

Operation Time Space Notes
hash(object) O(k) O(1) k = object size for strings/tuples; O(1) for int/float
Hash of int/float O(1) O(1) Fixed-size representation
Hash of string O(n) O(1) n = string length; cached after first call
Hash of tuple O(n) O(1) n = tuple length, combines element hashes
Hash of mutable Error - e.g., list, dict, set (unhashable)
Dictionary lookup O(1) avg O(1) O(n) worst case with hash collisions
Set membership O(1) avg O(1) O(n) worst case with hash collisions

Hash Basics

Creating Hashes

# Hash immutable types
hash(42)                    # O(1) - integer hashing is constant time
hash(3.14)                  # O(1) - float hashing is constant time
hash("hello")               # O(n) first call, O(1) cached - string length
hash((1, 2, 3))            # O(n) - tuple size, hashes each element

# Same value = same hash
hash("hello") == hash("hello")  # True

# Different values = different hash (usually)
hash("hello") != hash("world")  # Usually true

Unhashable Types

# Lists are unhashable (mutable)
try:
    hash([1, 2, 3])  # TypeError!
except TypeError as e:
    print("unhashable type: 'list'")

# Dictionaries are unhashable
try:
    hash({'a': 1})  # TypeError!
except TypeError:
    print("unhashable type: 'dict'")

# Sets are unhashable
try:
    hash({1, 2, 3})  # TypeError!
except TypeError:
    print("unhashable type: 'set'")

# Frozen sets ARE hashable
hash(frozenset([1, 2, 3]))  # OK - O(3)

Hash-based Collections

Dictionary Lookups

# Dictionary uses hash for O(1) lookup
d = {'key1': 'value1', 'key2': 'value2'}

# Lookup - O(1) average case
value = d['key1']  # O(1) - uses hash('key1')

# Insert - O(1) average case
d['key3'] = 'value3'  # O(1) - uses hash('key3')

# Delete - O(1) average case
del d['key1']  # O(1) - uses hash('key1')

# Membership - O(1) average case
if 'key2' in d:  # O(1) - uses hash('key2')
    pass

Set Operations

# Set uses hash for O(1) membership
s = {1, 2, 3, 4, 5}

# Add - O(1) average
s.add(6)  # O(1) - uses hash(6)

# Remove - O(1) average
s.discard(3)  # O(1) - uses hash(3)

# Membership - O(1) average
if 4 in s:  # O(1) - uses hash(4)
    pass

# Set intersection (requires hashing) - O(min(len(s1), len(s2)))
s1 = {1, 2, 3}
s2 = {2, 3, 4}
result = s1 & s2  # O(min(3, 3)) = O(3)

Hash Values

Hash Stability

# Python 3.3+: Hash randomization enabled
# Same object, same hash within session
x = 42
h1 = hash(x)
h2 = hash(x)
h1 == h2  # True - within same Python session

# Different sessions may have different hashes
# This prevents hash collision attacks

# Strings hash differently each session (Python 3.3+)
h1 = hash("hello")
# Restart Python
h2 = hash("hello")  # Different value (secure)

Hash Collisions

# Different values can have same hash (collision)
# Dictionaries handle this with probing or chaining

# Example: hash collision (rare)
# If hash(obj1) == hash(obj2), dict stores both
# and compares keys for equality

d = {}
d['a'] = 1
d['b'] = 2

# Even if hash('a') == hash('b'),
# d['a'] is still unique because 'a' != 'b'

Custom Hashing

hash and eq

# Custom hashable class
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __hash__(self):
        # Must return hash of immutable representation
        return hash((self.name, self.age))  # O(name_len + 1)

    def __eq__(self, other):
        # If hash same, equality test used to confirm
        if not isinstance(other, Person):
            return False
        return self.name == other.name and self.age == other.age  # O(name_len)

# Usage
p = Person('Alice', 30)
s = {p}  # Can add to set - uses __hash__()
d = {p: 'data'}  # Can use as key - uses __hash__()

Important Rule: hash and eq Agreement

# Rule: if a == b, then hash(a) == hash(b)

class GoodClass:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return self.value == other.value

    def __hash__(self):
        return hash(self.value)

# Rule follows: equal objects have equal hashes
a = GoodClass(5)
b = GoodClass(5)
assert a == b  # True
assert hash(a) == hash(b)  # True - good!

class BadClass:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return self.value == other.value

    def __hash__(self):
        return 42  # Wrong! Not based on value

# Rule violated: equal objects don't have equal hashes
a = BadClass(5)
b = BadClass(5)
assert a == b  # True
assert hash(a) == hash(b)  # True (both 42) but coincidence!
d = {a: 1}
d[b] += 1  # Oops, creates new key instead of updating!

Hashing Different Types

Immutable Types

# All hashable efficiently - O(size)
hash(None)             # O(1) - special case
hash(True)             # O(1) - boolean
hash(42)               # O(1) - small integer
hash(12345678901234)   # O(1) - large integer  
hash(3.14)             # O(1) - float
hash("hello")          # O(5) - string length
hash(b"hello")         # O(5) - bytes length
hash((1, 2, 3))        # O(3) - tuple depth
hash(frozenset([1,2])) # O(2) - frozenset size

Integers

# Integer hashing - O(1)
hash(0)      # 0
hash(1)      # 1
hash(-1)     # -2
hash(256)    # 256

# Small integers cached for performance
# hash(n) usually returns n itself for small values

# Large integers
big = 10**100
hash(big)  # O(1) due to Python's optimization

Strings

# String hashing - O(len(string))
hash("")      # O(0)
hash("a")     # O(1)
hash("hello") # O(5)

# String intern: equal strings have equal hashes
s1 = "hello"
s2 = "hello"
hash(s1) == hash(s2)  # True - O(5)

Tuples

# Tuple hashing combines element hashes - O(tuple_size)
hash((1,))         # O(1)
hash((1, 2, 3))    # O(3) - hashes all elements
hash(())           # O(0) - empty tuple

# If tuple contains unhashable, fails
try:
    hash((1, [2, 3]))  # TypeError! list is unhashable
except TypeError:
    pass

Using Hash for Caching

Memoization with Hash

# Cache expensive computation using hash
cache = {}

def expensive_function(arg):
    key = hash(arg)  # O(k) for hashing

    if key in cache:  # O(1) lookup
        return cache[key]

    result = do_expensive_work(arg)  # Slow
    cache[key] = result  # O(1) store
    return result

# Multiple calls with same hash are instant
result1 = expensive_function(("data", 1))  # Computed
result2 = expensive_function(("data", 1))  # From cache O(1)

Set Deduplication

# Remove duplicates using hash - O(n)
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique = set(numbers)  # O(n) - uses hash for each

# For unhashable types, need different approach
data = [[1, 2], [1, 2], [3, 4]]
# Can't use set directly
# Instead convert to hashable:
unique = set(tuple(d) for d in data)  # O(n)

Performance Notes

Hash Collisions and Worst Case

# Average case: O(1) dictionary access
# Worst case: O(n) if all items hash to same value

# Python mitigates this with:
# 1. Good hash functions
# 2. Randomized hashing (Python 3.3+)
# 3. Open addressing or chaining

# You shouldn't encounter O(n) in practice
d = {}
for i in range(1000):
    d[i] = i  # O(1) each, not O(n)

Hash vs Equality

# Hash is used first for efficiency
# If hash matches, equality check confirms

# This matters for custom classes:
class Efficient:
    def __init__(self, data):
        self.data = data

    def __hash__(self):
        return 42  # All same hash

    def __eq__(self, other):
        return self.data == other.data  # Expensive comparison
        # Only called if hashes match

# With 1000 unique objects:
s = {Efficient(i) for i in range(1000)}
# Each insertion: O(1) hash, then O(1) equality
# Total: O(1000) insertions, each O(1) if no collisions

Avoiding Hashing

Use frozenset Instead of set

# For hashable collections, use frozenset
frozen = frozenset([1, 2, 3])
hash(frozen)  # Works! Can use as dict key

# Mutable set can't be hashed
regular = {1, 2, 3}
try:
    hash(regular)  # TypeError!
except TypeError:
    pass

# So frozenset can be dict key/set member
d = {frozenset([1, 2]): 'value'}  # Works
s = {frozenset([3, 4])}  # Works

Version Notes

  • Python 2.x: Different hash implementation
  • Python 3.3+: Hash randomization for security
  • All versions: Hash of immutable types is stable
  • id() - Object identity (different from hash)
  • set() - Uses hash for membership
  • dict - Uses hash for key lookup

Best Practices

Do:

  • Implement __hash__() and __eq__() together for custom classes
  • Ensure hash agrees with equality: if a == b then hash(a) == hash(b)
  • Use frozen data structures for dict keys/set members
  • Use frozenset instead of set when you need hashability

Avoid:

  • Implementing __hash__() without __eq__() (or vice versa)
  • Relying on specific hash values (they can change between Python versions)
  • Hashing mutable objects (they're unhashable by design)
  • Implementing __hash__() that depends on mutable state
  • Assuming hash(a) == hash(b) means a == b (can be false due to collisions)