MCA [RAW]

CONCEPTS

Scalability

📈

❯ Horizontal Scaling
❯ Vertical Scaling
❯ Load Balancing
❯ Auto-scaling

Reliability

🛡️

❯ Fault Tolerance
❯ Redundancy
❯ Replication
❯ Disaster Recovery

Performance

⚡

❯ Caching
❯ CDN
❯ Database Optimization
❯ Async Processing

Availability

🔄

❯ High Availability
❯ Failover
❯ Health Checks
❯ Circuit Breakers

COMPONENTS

Load Balancer

NGINX, HAProxy, AWS ELB

Distribute traffic across servers

Cache

Redis, Memcached, CDN

Store frequently accessed data

Message Queue

RabbitMQ, Kafka, SQS

Async communication between services

Database

PostgreSQL, MongoDB, Cassandra

Persistent data storage

API Gateway

Kong, AWS API Gateway, Apigee

Single entry point for APIs

Service Mesh

Istio, Linkerd, Consul

Service-to-service communication

PATTERNS

Microservices

Decompose application into small, independent services

Event-Driven

Services communicate through events and message queues

CQRS

Separate read and write operations for better performance

Saga Pattern

Manage distributed transactions across microservices

Circuit Breaker

Prevent cascading failures in distributed systems

API Gateway

Single entry point for all client requests

SYNTAX_DEMO

architecture.py

# ============================================
# SYSTEM DESIGN FUNDAMENTALS
# ============================================

# CAP THEOREM
# In a distributed system, you can only guarantee 2 of 3:
# - Consistency: All nodes see the same data
# - Availability: Every request gets a response
# - Partition Tolerance: System works despite network failures

# Examples:
# CA: Traditional RDBMS (PostgreSQL, MySQL)
# CP: MongoDB, HBase, Redis
# AP: Cassandra, DynamoDB, CouchDB

# ACID vs BASE

# ACID (Traditional Databases)
# - Atomicity: All or nothing transactions
# - Consistency: Data integrity maintained
# - Isolation: Concurrent transactions don't interfere
# - Durability: Committed data persists

# BASE (NoSQL Databases)
# - Basically Available: System available most of the time
# - Soft state: State may change over time
# - Eventually consistent: System becomes consistent eventually

# ============================================
# SCALABILITY PATTERNS
# ============================================

# HORIZONTAL SCALING (Scale Out)
# Add more servers to handle increased load
# Pros: Better fault tolerance, unlimited scaling
# Cons: Complex, data consistency challenges

# Example Architecture:
"""
Client → Load Balancer → [Server 1, Server 2, Server 3, ...]
                              ↓
                         Shared Database
"""

# VERTICAL SCALING (Scale Up)
# Add more resources (CPU, RAM) to existing server
# Pros: Simple, no code changes
# Cons: Hardware limits, single point of failure

# LOAD BALANCING ALGORITHMS

# 1. Round Robin
servers = ["server1", "server2", "server3"]
current = 0

def round_robin():
    global current
    server = servers[current]
    current = (current + 1) % len(servers)
    return server

# 2. Least Connections
server_connections = {
    "server1": 5,
    "server2": 3,
    "server3": 7
}

def least_connections():
    return min(server_connections, key=server_connections.get)

# 3. Weighted Round Robin
servers_with_weights = [
    ("server1", 3),  # 3x capacity
    ("server2", 2),  # 2x capacity
    ("server3", 1)   # 1x capacity
]

# ============================================
# CACHING STRATEGIES
# ============================================

# CACHE-ASIDE (Lazy Loading)
def get_user(user_id):
    # Check cache first
    user = cache.get(f"user:{user_id}")
    
    if user is None:
        # Cache miss - fetch from database
        user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
        
        # Store in cache
        cache.set(f"user:{user_id}", user, ttl=3600)
    
    return user

# WRITE-THROUGH CACHE
def update_user(user_id, data):
    # Update database
    database.update(f"UPDATE users SET ... WHERE id = {user_id}")
    
    # Update cache
    cache.set(f"user:{user_id}", data, ttl=3600)

# WRITE-BEHIND (Write-Back) CACHE
def update_user_async(user_id, data):
    # Update cache immediately
    cache.set(f"user:{user_id}", data, ttl=3600)
    
    # Queue database update for later
    queue.enqueue("update_user", user_id, data)

# CACHE EVICTION POLICIES
# - LRU (Least Recently Used)
# - LFU (Least Frequently Used)
# - FIFO (First In, First Out)
# - TTL (Time To Live)

# ============================================
# DATABASE DESIGN
# ============================================

# DATABASE SHARDING (Horizontal Partitioning)
# Split data across multiple databases

# Hash-based Sharding
def get_shard(user_id, num_shards=4):
    return hash(user_id) % num_shards

# Range-based Sharding
def get_shard_by_range(user_id):
    if user_id < 1000000:
        return "shard_1"
    elif user_id < 2000000:
        return "shard_2"
    else:
        return "shard_3"

# DATABASE REPLICATION

# Master-Slave Replication
"""
Master (Write) → Slave 1 (Read)
              → Slave 2 (Read)
              → Slave 3 (Read)
"""

# Master-Master Replication
"""
Master 1 ←→ Master 2
(Both read and write)
"""

# DATABASE INDEXING
"""
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_order_date ON orders(created_at);
CREATE INDEX idx_composite ON users(last_name, first_name);
"""

# ============================================
# MESSAGE QUEUES
# ============================================

# PRODUCER-CONSUMER PATTERN
import queue
import threading

message_queue = queue.Queue()

def producer():
    for i in range(10):
        message = f"Message {i}"
        message_queue.put(message)
        print(f"Produced: {message}")

def consumer():
    while True:
        message = message_queue.get()
        print(f"Consumed: {message}")
        # Process message
        message_queue.task_done()

# Start producer and consumer threads
threading.Thread(target=producer).start()
threading.Thread(target=consumer, daemon=True).start()

# KAFKA-LIKE ARCHITECTURE
"""
Producer → Topic (Partitions) → Consumer Group
                                  ↓
                              [Consumer 1, Consumer 2, Consumer 3]
"""

# ============================================
# MICROSERVICES PATTERNS
# ============================================

# API GATEWAY PATTERN
"""
Client → API Gateway → [Auth Service, User Service, Order Service]
                    ↓
              [Rate Limiting, Logging, Caching]
"""

# SERVICE DISCOVERY
# Services register themselves and discover other services

service_registry = {
    "user-service": ["http://10.0.1.1:8080", "http://10.0.1.2:8080"],
    "order-service": ["http://10.0.2.1:8080"],
    "payment-service": ["http://10.0.3.1:8080"]
}

def discover_service(service_name):
    instances = service_registry.get(service_name, [])
    # Return random instance (simple load balancing)
    import random
    return random.choice(instances) if instances else None

# CIRCUIT BREAKER PATTERN
class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.state = "CLOSED"  # CLOSED, OPEN, HALF_OPEN
        self.last_failure_time = None
    
    def call(self, func, *args, **kwargs):
        if self.state == "OPEN":
            if time.time() - self.last_failure_time > self.timeout:
                self.state = "HALF_OPEN"
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e
    
    def on_success(self):
        self.failure_count = 0
        self.state = "CLOSED"
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = "OPEN"

# ============================================
# RATE LIMITING
# ============================================

# TOKEN BUCKET ALGORITHM
import time

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()
    
    def consume(self, tokens=1):
        self._refill()
        
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False
    
    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        tokens_to_add = elapsed * self.refill_rate
        
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now

# Usage
rate_limiter = TokenBucket(capacity=100, refill_rate=10)  # 10 tokens/sec

if rate_limiter.consume():
    # Process request
    pass
else:
    # Reject request (429 Too Many Requests)
    pass

# SLIDING WINDOW RATE LIMITING
from collections import deque
import time

class SlidingWindowRateLimiter:
    def __init__(self, max_requests, window_size):
        self.max_requests = max_requests
        self.window_size = window_size
        self.requests = deque()
    
    def allow_request(self):
        now = time.time()
        
        # Remove old requests outside window
        while self.requests and self.requests[0] < now - self.window_size:
            self.requests.popleft()
        
        if len(self.requests) < self.max_requests:
            self.requests.append(now)
            return True
        
        return False

# ============================================
# SYSTEM DESIGN EXAMPLES
# ============================================

# URL SHORTENER (like bit.ly)
"""
Requirements:
- Generate short URL from long URL
- Redirect short URL to original URL
- Handle billions of URLs
- Low latency

Design:
1. Hash Function: Generate unique short code
2. Database: Store mapping (short_code → long_url)
3. Cache: Redis for frequently accessed URLs
4. Load Balancer: Distribute traffic

Architecture:
Client → Load Balancer → [App Servers] → Cache (Redis)
                                       → Database (Sharded)

Short Code Generation:
- Base62 encoding (a-z, A-Z, 0-9) = 62^7 = 3.5 trillion URLs
- MD5 hash + take first 7 characters
- Auto-increment ID + Base62 encode
"""

import hashlib
import base64

def generate_short_code(long_url):
    # MD5 hash
    hash_object = hashlib.md5(long_url.encode())
    hash_hex = hash_object.hexdigest()
    
    # Take first 7 characters
    return hash_hex[:7]

# TWITTER FEED DESIGN
"""
Requirements:
- Post tweets
- Follow users
- View timeline (tweets from followed users)
- Handle millions of users

Design:
1. Tweet Service: Create and store tweets
2. Timeline Service: Generate user timelines
3. Fan-out Service: Push tweets to followers' timelines

Architecture:
User → API Gateway → Tweet Service → Database
                  → Timeline Service → Cache (Redis)
                  → Fan-out Service → Message Queue

Timeline Generation:
- Fan-out on write: Pre-compute timelines (fast reads)
- Fan-out on read: Compute on demand (slow reads, less storage)
- Hybrid: Fan-out for most users, on-demand for celebrities
"""

# NETFLIX VIDEO STREAMING
"""
Requirements:
- Upload and store videos
- Stream videos to millions of users
- Adaptive bitrate streaming
- Low latency

Design:
1. Upload Service: Process and encode videos
2. CDN: Distribute content globally
3. Streaming Service: Serve video chunks
4. Recommendation Service: Suggest content

Architecture:
User → CDN (Edge Servers) → Origin Servers
                          → Encoding Service
                          → Storage (S3)

Video Processing:
- Transcode to multiple resolutions (1080p, 720p, 480p)
- Split into chunks (HLS/DASH)
- Store in CDN for fast delivery
"""

# ============================================
# MONITORING & OBSERVABILITY
# ============================================

# Key Metrics to Monitor:
# - Latency (p50, p95, p99)
# - Throughput (requests/sec)
# - Error rate (%)
# - CPU/Memory usage
# - Database connections
# - Cache hit rate

# Logging Levels:
# DEBUG → INFO → WARNING → ERROR → CRITICAL

# Distributed Tracing:
# Track requests across multiple services
# Tools: Jaeger, Zipkin, OpenTelemetry

# Health Checks:
def health_check():
    checks = {
        "database": check_database(),
        "cache": check_cache(),
        "queue": check_queue()
    }
    
    all_healthy = all(checks.values())
    status = "healthy" if all_healthy else "unhealthy"
    
    return {"status": status, "checks": checks}

EXERCISES

Design URL Shortener (like bit.ly)medium

Design Twitter/Social Media Feedhard

Design Netflix/Video Streaminghard

Design Uber/Ride Sharing Systemhard

Design Distributed Cachehard

CONCEPTS

Scalability

Reliability

Performance

Availability

COMPONENTS

Load Balancer

Cache

Message Queue

Database

API Gateway

Service Mesh

PATTERNS

Microservices

Event-Driven

CQRS

Saga Pattern

Circuit Breaker

API Gateway

SYNTAX_DEMO

EXERCISES

DIRECTORY

courses

System Design Primer

Grokking System Design

System Design - MIT

Designing Data-Intensive Applications

videos

System Design Interview Channel

Gaurav Sen - System Design

Tech Dummies - System Design

ByteByteGo

blogs

High Scalability

Martin Fowler's Blog

AWS Architecture Blog

Netflix Tech Blog

practice

System Design Interview Questions

Awesome System Design

System Design Cheatsheet

InterviewBit System Design