CONCEPTS
Scalability
📈- ❯ Horizontal Scaling
- ❯ Vertical Scaling
- ❯ Load Balancing
- ❯ Auto-scaling
Reliability
🛡️- ❯ Fault Tolerance
- ❯ Redundancy
- ❯ Replication
- ❯ Disaster Recovery
Performance
⚡- ❯ Caching
- ❯ CDN
- ❯ Database Optimization
- ❯ Async Processing
Availability
🔄- ❯ High Availability
- ❯ Failover
- ❯ Health Checks
- ❯ Circuit Breakers
COMPONENTS
01
Load Balancer
NGINX, HAProxy, AWS ELBDistribute traffic across servers
02
Cache
Redis, Memcached, CDNStore frequently accessed data
03
Message Queue
RabbitMQ, Kafka, SQSAsync communication between services
04
Database
PostgreSQL, MongoDB, CassandraPersistent data storage
05
API Gateway
Kong, AWS API Gateway, ApigeeSingle entry point for APIs
06
Service Mesh
Istio, Linkerd, ConsulService-to-service communication
PATTERNS
Microservices
Decompose application into small, independent services
Event-Driven
Services communicate through events and message queues
CQRS
Separate read and write operations for better performance
Saga Pattern
Manage distributed transactions across microservices
Circuit Breaker
Prevent cascading failures in distributed systems
API Gateway
Single entry point for all client requests
SYNTAX_DEMO
architecture.py
# ============================================
# SYSTEM DESIGN FUNDAMENTALS
# ============================================
# CAP THEOREM
# In a distributed system, you can only guarantee 2 of 3:
# - Consistency: All nodes see the same data
# - Availability: Every request gets a response
# - Partition Tolerance: System works despite network failures
# Examples:
# CA: Traditional RDBMS (PostgreSQL, MySQL)
# CP: MongoDB, HBase, Redis
# AP: Cassandra, DynamoDB, CouchDB
# ACID vs BASE
# ACID (Traditional Databases)
# - Atomicity: All or nothing transactions
# - Consistency: Data integrity maintained
# - Isolation: Concurrent transactions don't interfere
# - Durability: Committed data persists
# BASE (NoSQL Databases)
# - Basically Available: System available most of the time
# - Soft state: State may change over time
# - Eventually consistent: System becomes consistent eventually
# ============================================
# SCALABILITY PATTERNS
# ============================================
# HORIZONTAL SCALING (Scale Out)
# Add more servers to handle increased load
# Pros: Better fault tolerance, unlimited scaling
# Cons: Complex, data consistency challenges
# Example Architecture:
"""
Client → Load Balancer → [Server 1, Server 2, Server 3, ...]
↓
Shared Database
"""
# VERTICAL SCALING (Scale Up)
# Add more resources (CPU, RAM) to existing server
# Pros: Simple, no code changes
# Cons: Hardware limits, single point of failure
# LOAD BALANCING ALGORITHMS
# 1. Round Robin
servers = ["server1", "server2", "server3"]
current = 0
def round_robin():
global current
server = servers[current]
current = (current + 1) % len(servers)
return server
# 2. Least Connections
server_connections = {
"server1": 5,
"server2": 3,
"server3": 7
}
def least_connections():
return min(server_connections, key=server_connections.get)
# 3. Weighted Round Robin
servers_with_weights = [
("server1", 3), # 3x capacity
("server2", 2), # 2x capacity
("server3", 1) # 1x capacity
]
# ============================================
# CACHING STRATEGIES
# ============================================
# CACHE-ASIDE (Lazy Loading)
def get_user(user_id):
# Check cache first
user = cache.get(f"user:{user_id}")
if user is None:
# Cache miss - fetch from database
user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
# Store in cache
cache.set(f"user:{user_id}", user, ttl=3600)
return user
# WRITE-THROUGH CACHE
def update_user(user_id, data):
# Update database
database.update(f"UPDATE users SET ... WHERE id = {user_id}")
# Update cache
cache.set(f"user:{user_id}", data, ttl=3600)
# WRITE-BEHIND (Write-Back) CACHE
def update_user_async(user_id, data):
# Update cache immediately
cache.set(f"user:{user_id}", data, ttl=3600)
# Queue database update for later
queue.enqueue("update_user", user_id, data)
# CACHE EVICTION POLICIES
# - LRU (Least Recently Used)
# - LFU (Least Frequently Used)
# - FIFO (First In, First Out)
# - TTL (Time To Live)
# ============================================
# DATABASE DESIGN
# ============================================
# DATABASE SHARDING (Horizontal Partitioning)
# Split data across multiple databases
# Hash-based Sharding
def get_shard(user_id, num_shards=4):
return hash(user_id) % num_shards
# Range-based Sharding
def get_shard_by_range(user_id):
if user_id < 1000000:
return "shard_1"
elif user_id < 2000000:
return "shard_2"
else:
return "shard_3"
# DATABASE REPLICATION
# Master-Slave Replication
"""
Master (Write) → Slave 1 (Read)
→ Slave 2 (Read)
→ Slave 3 (Read)
"""
# Master-Master Replication
"""
Master 1 ←→ Master 2
(Both read and write)
"""
# DATABASE INDEXING
"""
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_order_date ON orders(created_at);
CREATE INDEX idx_composite ON users(last_name, first_name);
"""
# ============================================
# MESSAGE QUEUES
# ============================================
# PRODUCER-CONSUMER PATTERN
import queue
import threading
message_queue = queue.Queue()
def producer():
for i in range(10):
message = f"Message {i}"
message_queue.put(message)
print(f"Produced: {message}")
def consumer():
while True:
message = message_queue.get()
print(f"Consumed: {message}")
# Process message
message_queue.task_done()
# Start producer and consumer threads
threading.Thread(target=producer).start()
threading.Thread(target=consumer, daemon=True).start()
# KAFKA-LIKE ARCHITECTURE
"""
Producer → Topic (Partitions) → Consumer Group
↓
[Consumer 1, Consumer 2, Consumer 3]
"""
# ============================================
# MICROSERVICES PATTERNS
# ============================================
# API GATEWAY PATTERN
"""
Client → API Gateway → [Auth Service, User Service, Order Service]
↓
[Rate Limiting, Logging, Caching]
"""
# SERVICE DISCOVERY
# Services register themselves and discover other services
service_registry = {
"user-service": ["http://10.0.1.1:8080", "http://10.0.1.2:8080"],
"order-service": ["http://10.0.2.1:8080"],
"payment-service": ["http://10.0.3.1:8080"]
}
def discover_service(service_name):
instances = service_registry.get(service_name, [])
# Return random instance (simple load balancing)
import random
return random.choice(instances) if instances else None
# CIRCUIT BREAKER PATTERN
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.timeout = timeout
self.state = "CLOSED" # CLOSED, OPEN, HALF_OPEN
self.last_failure_time = None
def call(self, func, *args, **kwargs):
if self.state == "OPEN":
if time.time() - self.last_failure_time > self.timeout:
self.state = "HALF_OPEN"
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
self.on_success()
return result
except Exception as e:
self.on_failure()
raise e
def on_success(self):
self.failure_count = 0
self.state = "CLOSED"
def on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
# ============================================
# RATE LIMITING
# ============================================
# TOKEN BUCKET ALGORITHM
import time
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate
self.last_refill = time.time()
def consume(self, tokens=1):
self._refill()
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
def _refill(self):
now = time.time()
elapsed = now - self.last_refill
tokens_to_add = elapsed * self.refill_rate
self.tokens = min(self.capacity, self.tokens + tokens_to_add)
self.last_refill = now
# Usage
rate_limiter = TokenBucket(capacity=100, refill_rate=10) # 10 tokens/sec
if rate_limiter.consume():
# Process request
pass
else:
# Reject request (429 Too Many Requests)
pass
# SLIDING WINDOW RATE LIMITING
from collections import deque
import time
class SlidingWindowRateLimiter:
def __init__(self, max_requests, window_size):
self.max_requests = max_requests
self.window_size = window_size
self.requests = deque()
def allow_request(self):
now = time.time()
# Remove old requests outside window
while self.requests and self.requests[0] < now - self.window_size:
self.requests.popleft()
if len(self.requests) < self.max_requests:
self.requests.append(now)
return True
return False
# ============================================
# SYSTEM DESIGN EXAMPLES
# ============================================
# URL SHORTENER (like bit.ly)
"""
Requirements:
- Generate short URL from long URL
- Redirect short URL to original URL
- Handle billions of URLs
- Low latency
Design:
1. Hash Function: Generate unique short code
2. Database: Store mapping (short_code → long_url)
3. Cache: Redis for frequently accessed URLs
4. Load Balancer: Distribute traffic
Architecture:
Client → Load Balancer → [App Servers] → Cache (Redis)
→ Database (Sharded)
Short Code Generation:
- Base62 encoding (a-z, A-Z, 0-9) = 62^7 = 3.5 trillion URLs
- MD5 hash + take first 7 characters
- Auto-increment ID + Base62 encode
"""
import hashlib
import base64
def generate_short_code(long_url):
# MD5 hash
hash_object = hashlib.md5(long_url.encode())
hash_hex = hash_object.hexdigest()
# Take first 7 characters
return hash_hex[:7]
# TWITTER FEED DESIGN
"""
Requirements:
- Post tweets
- Follow users
- View timeline (tweets from followed users)
- Handle millions of users
Design:
1. Tweet Service: Create and store tweets
2. Timeline Service: Generate user timelines
3. Fan-out Service: Push tweets to followers' timelines
Architecture:
User → API Gateway → Tweet Service → Database
→ Timeline Service → Cache (Redis)
→ Fan-out Service → Message Queue
Timeline Generation:
- Fan-out on write: Pre-compute timelines (fast reads)
- Fan-out on read: Compute on demand (slow reads, less storage)
- Hybrid: Fan-out for most users, on-demand for celebrities
"""
# NETFLIX VIDEO STREAMING
"""
Requirements:
- Upload and store videos
- Stream videos to millions of users
- Adaptive bitrate streaming
- Low latency
Design:
1. Upload Service: Process and encode videos
2. CDN: Distribute content globally
3. Streaming Service: Serve video chunks
4. Recommendation Service: Suggest content
Architecture:
User → CDN (Edge Servers) → Origin Servers
→ Encoding Service
→ Storage (S3)
Video Processing:
- Transcode to multiple resolutions (1080p, 720p, 480p)
- Split into chunks (HLS/DASH)
- Store in CDN for fast delivery
"""
# ============================================
# MONITORING & OBSERVABILITY
# ============================================
# Key Metrics to Monitor:
# - Latency (p50, p95, p99)
# - Throughput (requests/sec)
# - Error rate (%)
# - CPU/Memory usage
# - Database connections
# - Cache hit rate
# Logging Levels:
# DEBUG → INFO → WARNING → ERROR → CRITICAL
# Distributed Tracing:
# Track requests across multiple services
# Tools: Jaeger, Zipkin, OpenTelemetry
# Health Checks:
def health_check():
checks = {
"database": check_database(),
"cache": check_cache(),
"queue": check_queue()
}
all_healthy = all(checks.values())
status = "healthy" if all_healthy else "unhealthy"
return {"status": status, "checks": checks}EXERCISES
Design URL Shortener (like bit.ly)medium
Design Twitter/Social Media Feedhard
Design Netflix/Video Streaminghard
Design Uber/Ride Sharing Systemhard
Design Distributed Cachehard