Topics
System Design Topics
Key system design principles distilled from engineering blogs at top companies. Filter by difficulty or topic, and click any card for details.
Interview Relevance
Difficulty
Topic
Basic
No prior system design experience needed
Single Primary + Read Replicas
Use a single primary for writes and scale reads horizontally with many replicas across regions.
Caching Reduces Database Load
Put a cache layer in front of your database to serve frequently read data without hitting the DB every time.
Connection Pooling
Use a connection pooler to reuse database connections instead of creating a new one per request.
Rate Limiting Protects Your System
Limit how many requests a client or endpoint can make in a time window to prevent overwhelming the system.
High Availability with Standby Replicas
Keep a synchronized standby ready to take over if the primary server fails, minimizing downtime.
Performance vs Scalability
A performance problem means your system is slow for a single user. A scalability problem means it's fast for one but slow under heavy load.
Latency vs Throughput
Latency is the time to complete one action. Throughput is how many actions complete per unit time. Aim for maximal throughput with acceptable latency.
CAP Theorem
In a distributed system, you can only guarantee two of three: Consistency, Availability, and Partition Tolerance. Since networks fail, you must choose between CP and AP.
CP vs AP Tradeoff
Choose CP when your business requires atomic reads and writes. Choose AP when the system must stay responsive even if data is temporarily stale.
Weak Consistency
After a write, reads may or may not see it. Best-effort delivery is acceptable when losing some data is tolerable.
Eventual Consistency
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
Strong Consistency
After a write, every subsequent read returns the updated value. Data is replicated synchronously.
Availability in Numbers
Availability is measured in 'nines' — 99.9% (three 9s) allows ~8h 46min downtime/year, while 99.99% (four 9s) allows only ~52 minutes.
DNS Basics
DNS translates domain names to IP addresses using a hierarchical system of servers. It's the first step in every web request.
Load Balancer Overview
Load balancers distribute incoming requests across multiple servers, preventing overload and eliminating single points of failure.
Horizontal vs Vertical Scaling
Vertical scaling (scale up) means bigger hardware. Horizontal scaling (scale out) means more machines. Horizontal is cheaper and more resilient but adds complexity.
ACID Properties
ACID (Atomicity, Consistency, Isolation, Durability) guarantees that database transactions are reliable even during failures.
Key-Value Stores
Key-value stores offer O(1) reads and writes, backed by memory or SSD. Best for simple data models and rapidly-changing data like caches.
Caching Layers Overview
Caching can happen at every layer: client (browser/OS), CDN, web server (reverse proxy), application (Redis/Memcached), and database.
Cache-Aside (Lazy Loading)
The application checks the cache first. On a miss, it loads from the database, stores the result in cache, then returns it. Only requested data gets cached.
Sticky Sessions & Centralized Session State
Load-balanced servers break server-local sessions. Fix it with a centralized session store or load-balancer-injected cookies that pin users to a backend.
RAID: Disk Redundancy Levels
RAID combines multiple disks for performance and/or redundancy. RAID 0 stripes for speed, RAID 1 mirrors for safety, RAID 5/6 balance economy and fault tolerance.
Static Content Pre-generation
Accept dynamic input but serve pre-rendered static HTML files. Web servers are extremely fast at serving static content, avoiding per-request computation.
TCP vs UDP
TCP guarantees ordered, reliable delivery via handshakes and retransmission. UDP is connectionless and faster but may lose or reorder packets.
Object Caching vs Query Caching
Caching assembled objects instead of raw query results is easier to invalidate and enables async pre-assembly by worker servers.
Capacity Estimation
Capacity estimation converts product requirements into concrete numbers — DAU, QPS, storage, and bandwidth — so you can size infrastructure before building it.
Intermediate
For early-career engineers starting to design systems
Cache Stampede Prevention
Use cache locking/leasing so only one request fetches from the DB on a miss — others wait for the repopulated cache.
Workload Isolation (Noisy Neighbor)
Route low-priority and high-priority workloads to separate instances so one can't degrade the other.
Avoid Complex Joins in OLTP
Multi-table joins are an OLTP anti-pattern. Break them apart and move join logic to the application layer.
Offload Writes to Sharded Systems
Migrate write-heavy, shardable workloads to horizontally scalable systems to protect the primary.
Active-Passive Failover
Heartbeats between an active and passive server detect failures. The passive takes over the active's IP when a heartbeat is missed.
Active-Active Failover
Both servers actively handle traffic, spreading load between them. If one fails, the other absorbs all traffic.
CDN — Push vs Pull
Push CDNs receive content when you upload it (good for low-traffic, rarely changing content). Pull CDNs fetch content on first request (good for high-traffic sites).
Layer 4 vs Layer 7 Load Balancing
Layer 4 routes based on IP/port (fast, simple). Layer 7 routes based on request content like URL, headers, and cookies (flexible, smarter).
Reverse Proxy
A reverse proxy sits in front of backend servers, providing a unified interface while adding security, caching, compression, and SSL termination.
Master-Slave Replication
The master handles all writes and replicates them to one or more slaves that serve read-only traffic.
Federation (Functional Partitioning)
Split databases by function (e.g., users, products, forums) to reduce per-database traffic and improve cache locality.
Denormalization
Store redundant copies of data to avoid expensive joins, trading write complexity for read performance.
SQL Tuning Essentials
Benchmark, profile, then optimize: tighten schemas, add proper indices, avoid expensive joins, and partition hot tables.
Document Stores
Document stores center around JSON/XML documents, providing flexible schemas and APIs to query document internals. Best for semi-structured, occasionally changing data.
SQL vs NoSQL Decision Guide
Choose SQL for structured data, complex joins, and transactions. Choose NoSQL for flexible schemas, massive scale, and high-throughput workloads.
Write-Through Cache
The application writes to the cache, and the cache synchronously writes to the database. Data is never stale, but writes are slower.
Write-Behind (Write-Back) Cache
The application writes to the cache, which asynchronously flushes to the database later. Fast writes, but risk of data loss if the cache crashes.
DNS Round Robin Drawbacks
DNS round robin is the simplest load-distribution scheme — the DNS server cycles through IPs — but caching and lack of health awareness make it unreliable.
Database Partitioning by User Attribute
Split users across servers by a simple attribute (name range, school, geography) for a quick horizontal scaling win before investing in full sharding.
Memcached: Shared In-Memory Cache Tier
Memcached is a dedicated in-memory key-value daemon that multiple web servers share. LRU eviction automatically discards cold entries when memory is full.
Network Security Tiers (Defense in Depth)
Restrict traffic between architecture tiers with port-level firewalls: only HTTP in from the internet, only MySQL between web and DB servers.
MySQL Query Cache
Enable MySQL's built-in query cache to instantly return results for repeated identical queries without re-executing them.
Web Layer vs Application Layer
Separating the web layer from the application (platform) layer lets you scale and configure each independently.
Microservices
A suite of independently deployable, small, modular services. Each runs a unique process and communicates via lightweight protocols.
Service Discovery
Systems like Consul, Etcd, and Zookeeper help services find each other by tracking registered names, addresses, and ports.
Message Queues
Message queues decouple producers from consumers: a publisher posts a job, a worker picks it up and processes it in the background.
Task Queues
Task queues receive tasks with their data, execute them, and return results. They support scheduling and are ideal for compute-intensive background work.
RPC vs REST
RPC exposes behaviors (actions). REST exposes resources (data). RPC is common for internal services; REST is preferred for public APIs.
REST API Design
RESTful APIs identify resources by URI, change them with HTTP verbs, use status codes for errors, and should be fully accessible in a browser (HATEOAS).
Types of Load Balancers
Load balancers come in three configuration types (software, hardware, cloud) and three functional types (L4, L7, GSLB), each with distinct cost, flexibility, and performance tradeoffs.
Types of Caching
Caches come in four architectural types — application server, distributed, global, and CDN — each trading off simplicity, scalability, and latency differently.
Types of Databases
Each database type optimizes for a different access pattern and consistency model — RDBMS for transactions and joins, NoSQL for flexible schemas and horizontal scale, NewSQL for global ACID, and time-series for sequential telemetry.
Message Queues Deep Dive
Message queues decouple producers from consumers through an intermediate buffer, enabling asynchronous communication, independent scaling, and fault tolerance across distributed services.
Rate Limiting
Rate limiting controls how many requests a client can make in a given time window, protecting systems from abuse, DoS attacks, and resource exhaustion while ensuring fair access.
Database Indexing
Indexes are auxiliary data structures that speed up reads at the cost of slower writes and extra storage. Choose index type based on query patterns: B-tree for range scans, hash for equality lookups, inverted for full-text search.
Real-Time Communication
Polling is simplest but wasteful, long polling reduces wasted requests, WebSockets provide full-duplex real-time channels, and SSE offers lightweight one-way server push — choose based on directionality, latency, and infrastructure complexity.
Storage Types
Object storage is best for large unstructured blobs like images and videos, block storage provides raw disk volumes for databases and VMs, and file storage offers shared hierarchical access — store metadata in a database and media in object storage.
Reliability and Resilience Patterns
Build resilience by eliminating single points of failure through redundancy, protecting cascading failures with circuit breakers, making retries safe with exponential backoff and idempotency, and designing for graceful degradation under overload.
Observability
Observability rests on three pillars — logs capture discrete events, metrics track numeric aggregates over time, and traces follow a single request across services — together they let you detect, diagnose, and resolve production issues.
Advanced
For mid-to-senior engineers operating at scale
MVCC Tradeoffs in PostgreSQL
PostgreSQL's MVCC copies the entire row on every update, causing write amplification, dead tuple bloat, and vacuum pressure.
Multi-Layer Rate Limiting
Apply rate limiting at every layer — application, connection pooler, proxy, and query — for defense in depth.
Safe Schema Migrations at Scale
Only allow lightweight schema changes in production. Anything that rewrites the table is too dangerous at scale.
Cascading Replication for Replica Scaling
When the primary can't stream WAL to all replicas, use intermediate replicas to relay WAL downstream.
Cascading Failure Prevention
The classic failure loop is: load spike -> latency rise -> timeouts -> retries -> amplified load. Break it at every link.
Master-Master Replication
Both masters serve reads and writes, coordinating with each other. If either goes down, the other continues operating.
Sharding
Distribute data across different databases so each manages only a subset. Reduces traffic, replication, and index size per shard.
Wide Column Stores
Wide column stores (Bigtable, HBase, Cassandra) use column families with row keys. Built for very large datasets with high availability and scalability.
Graph Databases
Graph databases represent data as nodes and relationships (edges). Optimized for complex many-to-many relationships like social networks.
Refresh-Ahead Cache
The cache automatically refreshes recently accessed entries before their TTL expires, reducing read latency if predictions are accurate.
Multi-Data-Center & Availability Zones
A single data center is a single point of failure. Distribute across availability zones with independent power and networking, using global DNS to route users.
Back Pressure
When queues grow beyond a threshold, reject new work with HTTP 503 and let clients retry with exponential backoff. This preserves throughput for jobs already in the queue.