PyPy Implementation Details¶
PyPy is an alternative Python implementation written in Python with a JIT (Just-In-Time) compiler. It aims for high performance while maintaining compatibility with CPython.
JIT Compilation¶
Warm-up Period¶
# PyPy behavior: Initial runs slower (compilation), then faster
# Function definition: not compiled yet
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
# First call: slow (interpreter)
result = fibonacci(5)
# Calls 2-100: Profiling happens
for i in range(100):
result = fibonacci(5)
# Call 101+: JIT compiled and fast
# ~100-1000x faster than CPython for hot loops!
Tracing JIT¶
PyPy uses trace-based JIT:
# Loop gets compiled to machine code
total = 0
for i in range(1000000): # This loop gets JIT compiled!
total += i
print(total) # Executes in compiled machine code
Performance Characteristics¶
Loop Optimization¶
Tight loops benefit most from JIT:
# PyPy: ~100x faster than CPython
# CPython: O(n) interpreted
# PyPy: O(n) compiled to machine code
def sum_range(n):
total = 0
for i in range(n):
total += i
return total
# PyPy: 100-1000x faster after warm-up
# CPython: Baseline interpreter speed
Object Allocation¶
# PyPy optimizes common patterns
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
# Creates millions of points
points = [Point(i, i+1) for i in range(1000000)]
# PyPy: Optimizes object allocation in comprehension
# CPython: Normal allocation overhead
Complexity Comparison¶
Standard Operations¶
| Operation | CPython | PyPy | Notes |
|---|---|---|---|
list.append() |
O(1) amortized | O(1) amortized | Same algorithm |
dict[key] |
O(1) avg, O(n) worst | O(1) avg, O(n) worst | Same hash table |
set in |
O(1) avg, O(n) worst | O(1) avg, O(n) worst | Same hash set |
| Loop (tight) | O(n) | O(n)* | PyPy much faster |
*Amortized or constant with much lower constant
When PyPy Excels¶
CPU-Bound Code¶
# PyPy is 10-100x faster for CPU-bound tasks
# After warm-up period (~100ms-1s)
import time
def heavy_computation():
total = 0
for i in range(100000):
for j in range(100):
total += i * j
return total
# CPython: ~seconds
# PyPy: ~milliseconds
Long-Running Servers¶
PyPy excellent for servers:
- Initial startup slightly slower
- Pays off in hours/days of running
- Can handle 10-50x more requests
Scientific Computing¶
For pure Python algorithms without NumPy:
# Pure Python algorithm (no NumPy)
# PyPy: 10-100x faster
# CPython: Slower
def matrix_multiply(a, b):
size = len(a)
c = [[0] * size for _ in range(size)]
for i in range(size):
for j in range(size):
for k in range(size):
c[i][j] += a[i][k] * b[k][j]
return c
When CPython is Better¶
Startup Performance¶
# Quick scripts: CPython starts faster
# PyPy: 200-500ms startup overhead
# CPython: 50-100ms startup
# For quick scripts, CPython preferred
C Extension Compatibility¶
# NumPy, pandas, etc. use C extensions
import numpy as np
# NumPy: Requires CPython (no PyPy support)
# PyPy: Limited C extension support
Mixed Workloads¶
# Quick initialization + short run
# Neither JIT nor startup offset the cost
# CPython better for:
# - Scripts that run once and exit
# - Mixed I/O and CPU work
# - One-off data processing
Memory Behavior¶
Allocation Strategy¶
PyPy uses different GC strategy:
# PyPy: Generational GC (no reference counting overhead)
# CPython: Reference counting + GC
# Creating many temporary objects:
# PyPy may be faster (no ref count updates)
# CPython may have more pause time (GC collection)
for i in range(1000000):
temp_list = [i] # Allocate and discard
# PyPy: GC handles efficiently
# CPython: Reference count decremented
Practical Recommendations¶
Use PyPy When:¶
- ✅ CPU-bound code with loops
- ✅ Long-running processes (servers)
- ✅ No dependency on C extensions
- ✅ Performance critical
Use CPython When:¶
- ✅ Needs C extension libraries (NumPy, pandas, etc.)
- ✅ Quick startup important
- ✅ Third-party packages have poor PyPy support
- ✅ Standard approach expected in team
Migration from CPython to PyPy¶
# Usually just works!
# PyPy aims for 99% compatibility
# Check compatibility
pip install pypy3
# Run code
pypy3 your_script.py
# Some packages may not support PyPy:
pip install package-name # May fail on PyPy
Version Information¶
| PyPy Version | Python Version | Performance |
|---|---|---|
| PyPy3.9 | Python 3.9 | Good |
| PyPy3.10 | Python 3.10 | Excellent |
| PyPy3.11 | Python 3.11 | Excellent+ |
Latest versions provide best performance.
Benchmarking¶
import time
# Proper PyPy benchmark
def benchmark(func, *args):
# Warm-up: allow JIT compilation
for _ in range(100):
func(*args)
# Timed run
start = time.time()
for _ in range(10000):
func(*args)
elapsed = time.time() - start
print(f"Time: {elapsed:.3f}s")
# Without warm-up, numbers misleading!