Interview Guide
System Design Interview Cheat Sheet
A quick-reference guide for system design interviews. Key questions and trade-off principles are always visible — expand rows for tips and details.
Delivery Framework
Adapted from Hello Interview — System Design Delivery Framework
Get a clear understanding of the system you are designing. Break requirements into functional (core features), non-functional (system qualities), and capacity estimation (only if it influences design). Aim for the top 3 in each category — a long list hurts more than it helps.
- Define "Users/Clients should be able to..." statements for the top 3 core features.
- Ask targeted questions as if talking to a product manager — "Does the system need to do X?", "What happens if Y?"
- Prioritize ruthlessly — many systems have hundreds of features, but your job is to identify the critical 3.
- Phrase as "The system should be..." statements, quantified and in context (e.g., "low-latency search, < 500ms" not just "low latency").
- Consider: CAP theorem (consistency vs. availability), environment constraints (mobile, bandwidth), scalability (bursty traffic, read/write ratio), latency targets, durability, security, fault tolerance, and compliance.
- Identify the top 3–5 that are most relevant to your specific system.
- Skip upfront math unless it directly influences your design — don't calculate storage and QPS just to conclude "it's a lot."
- Tell the interviewer you'll do math inline when it matters (e.g., estimating topic count to decide if a min-heap fits on one node).
- When you do estimate, keep it fast: round aggressively and focus on order-of-magnitude.
Key Questions
- Who are the users and how will they use the system?
- What are the core features we need to support (top 3)?
- Should the system prioritize consistency or availability?
- Are there latency requirements for specific operations?
- What is the expected scale (DAU, requests per second)?
- Are there geographic, compliance, or environment constraints?
- Spend about 5 minutes here — it sets the direction for the entire interview.
- Keep requirements targeted: the rest of the interview is about meeting the requirements you identify, so be strategic.
- State assumptions out loud so the interviewer can course-correct early.
- Identify the read-to-write ratio — it drives almost every architectural decision.
Identify and list the core entities of your system. This defines terms, surfaces the data central to your design, and gives you a foundation for the API and data model. Keep it as a simple bulleted first draft — you will evolve it as you design.
Key Questions
- Who are the actors in the system? Are they overlapping (e.g., driver vs. rider)?
- What are the nouns or resources necessary to satisfy each functional requirement?
- Are there relationships between entities (1:1, 1:N, N:N)?
- Ask yourself: Who are the actors? What are the nouns or resources needed to satisfy the functional requirements?
- Don't list the full data model yet — you don't know what you don't know. Start small and iterate.
- Choose good names for entities. Clear naming shows communication skill and avoids confusion later.
- This is as simple as jotting down a bulleted list and explaining it to the interviewer as a first draft.
- Once you reach the high-level design and see what state updates per request, you can flesh out columns/fields.
Define the contract between your system and its users before diving into architecture. This maps directly to your functional requirements and will guide your high-level design. Choose a protocol, define endpoints using your core entities as resources.
- REST (default): Uses HTTP verbs (GET, POST, PUT, DELETE) on resources. Choose this unless you have a specific reason not to.
- GraphQL: Lets clients specify exactly what data they need. Choose when you have diverse clients with different data needs.
- RPC (gRPC): Action-oriented, faster for service-to-service communication. Use for internal APIs when performance is critical.
- Use plural resource nouns (e.g., /tweets, not /tweet).
- Derive the current user from the auth token in the request header — never from request bodies or path parameters.
- For real-time features, layer on WebSockets or SSE after designing the core REST API.
- Version your API (e.g., /v1/tweets) to allow evolution without breaking clients.
Key Questions
- Does the system need real-time updates (WebSockets, SSE) or is request/response sufficient?
- Are there multiple client types with different data needs (mobile vs. web)?
- Do any endpoints need pagination, filtering, or cursor-based iteration?
- Don't overthink protocol choice — default to REST and move on.
- Map each functional requirement to one or more endpoints.
- Keep request/response shapes simple; your interviewer can infer obvious fields.
- Call out authentication explicitly — "the current user is derived from the auth token."
For backend or data-processing systems, describe the high-level sequence of actions the system performs on inputs to produce outputs. If your system doesn't involve a long pipeline of actions, skip this step and go straight to the high-level design.
Key Questions
- What are the inputs to the system and what are the expected outputs?
- Is the processing batch or streaming?
- Are there stages that can run in parallel vs. stages that must be sequential?
- Where does data need to be persisted between stages?
- Define the data flow as a simple numbered list of stages (e.g., Fetch → Parse → Extract → Store → Repeat).
- This step is most useful for systems like web crawlers, ETL pipelines, stream processors, or ML inference pipelines.
- Use this flow to inform the boxes and arrows in your high-level design.
- If your system is primarily request/response (e.g., a social network, URL shortener), skip this step entirely.
- Keep it high-level — the goal is to establish the processing stages, not to detail each one.
Draw boxes and arrows representing the major components and how they interact. Go through your API endpoints one by one, building up the design sequentially. Stay focused on a simple design that meets core functional requirements — layer on complexity in deep dives.
Common Building Blocks
Key Questions
- For each endpoint: what is the write path? What is the read path?
- Which components are stateless and which are stateful?
- Where are the potential bottlenecks in this design?
- Build up the design endpoint-by-endpoint — walk through each API you defined and show how data flows through the system.
- Talk through your thought process as you draw. Be explicit about what state changes with each request.
- When your request reaches the database, document relevant columns/fields next to it visually — no need for full schemas.
- Don't waste time on obvious schema fields (name, email, password hash). Focus on fields that are relevant to your design.
- Note areas where you could add complexity (caches, queues) with a verbal callout, then move on — save it for deep dives.
- A clear 5-box diagram is better than a sprawling 20-box one. Resist the urge to over-engineer at this stage.
- Label arrows with the protocol or action (e.g., REST, gRPC, pub/sub).
- Call out which components are stateless vs. stateful — it affects how you scale later.
Harden your design by ensuring it meets all non-functional requirements, addressing edge cases, identifying bottlenecks, and responding to interviewer probes. The degree to which you lead deep dives proactively reflects your seniority.
- Walk through each non-functional requirement and show how your design satisfies it.
- Address edge cases: What happens on duplicates, failures, timeouts, or network partitions?
- Identify bottlenecks and propose solutions (caching, sharding, async processing, etc.).
- Discuss trade-offs explicitly — every optimization has a cost.
- Junior candidates can expect the interviewer to point out areas for improvement.
- Senior candidates should proactively identify weak points and lead the discussion.
- Balance being proactive with giving the interviewer room to ask questions and probe your design.
Common Building Blocks
- Walk through the iterative loop: Benchmark → Profile bottlenecks → Address them → Repeat.
- Add caching for read-heavy workloads — discuss cache invalidation strategies.
- Consider database scaling: read replicas, sharding, federation, denormalization.
- Discuss trade-offs explicitly — every decision has a cost (complexity, consistency, latency, $).
- Don't talk over the interviewer — they likely have specific signals they want to get from you.
- Mention CDNs for static content, message queues for async processing, autoscaling for traffic spikes.
Clarifying Questions
- How much data will we store, and what is the growth rate?
- What is the average size of each record/object?
- Do we need to support search or complex queries?
- What is the data retention policy — do we keep data forever or expire it?
- Is the data relational, document-oriented, or graph-like?
- What happens when a dependent service is down?
- How do we handle duplicate requests or idempotency?
- Are there rate-limiting or abuse-prevention requirements?
- Do we need to support backward compatibility or migrations?
- What is the budget — are we optimizing for cost, speed, or reliability?
- What are the top 3 core features ("Users should be able to...")?
- Who are the primary users (end users, internal services, third-party integrations)?
- Are there different user roles or permission levels?
- What does the write path look like vs. the read path?
- Do users need real-time updates (WebSockets, SSE) or is request/response sufficient?
- Are there multiple client types with different data needs (mobile vs. web)?
- CAP Theorem: Should the system prioritize consistency or availability?
- Environment Constraints: Are there constraints like mobile devices, limited bandwidth, or battery life?
- Scalability: Does the system have bursty traffic or unique scaling needs? What is the read vs. write ratio?
- Latency: Are there specific latency targets for critical operations (e.g., search < 500ms)?
- Durability: How important is it that data is never lost (social network vs. banking system)?
- Security: What are the data protection, access control, and compliance requirements (encryption, GDPR, SOC2)?
- Fault Tolerance: How well must the system handle failures? Consider redundancy, failover, and recovery.
- Compliance: Are there legal or regulatory requirements (industry standards, data protection laws)?
- How many daily active users should we plan for?
- What is the expected requests-per-second (reads and writes)?
- Is traffic evenly distributed or are there spikes (e.g., time-of-day, viral events)?
- What is the read-to-write ratio?
- Do we need to support multiple geographic regions?
Trade-offs
Every design decision has costs. Always state the trade-off explicitly: consistency vs. availability, latency vs. throughput, simplicity vs. scalability.
Load Balancers
- L4 (transport) vs. L7 (application): L4 is faster and cheaper but cannot inspect HTTP headers or route by URL; L7 enables smart routing, SSL termination, and sticky sessions but adds latency.
- Hardware vs. Software: Hardware LBs handle massive throughput with very low latency but cost tens of thousands and are inflexible; software LBs (NGINX, HAProxy, Envoy) are cheap and configurable but you manage them.
- Round-robin vs. least-connections vs. consistent hashing: Simple algorithms are predictable but can cause hot spots; adaptive algorithms distribute load better but are harder to debug.
Sharding / Partitioning
- Hash-based vs. range-based partitioning: Hash distributes evenly but makes range queries expensive; range keeps related data together but risks hot partitions.
- Application-level vs. proxy-level sharding: App-level gives full control but couples routing logic to business code; proxy-level (e.g., Vitess, ProxySQL) centralizes routing but adds a network hop.
- More shards vs. fewer shards: More shards improve parallelism but increase cross-shard join complexity and rebalancing difficulty.
Caching
- Cache-aside vs. write-through vs. write-back: Cache-aside is simple but risks stale reads on cache miss; write-through guarantees consistency but adds write latency; write-back is fast but risks data loss on crash.
- Local (in-process) vs. distributed (Redis/Memcached): Local caches are faster with zero network overhead but don't share state across instances; distributed caches share state but add network latency and operational cost.
- TTL-based vs. event-based invalidation: TTL is simple but serves stale data until expiry; event-based is precise but requires infrastructure to propagate changes.
Databases (SQL vs. NoSQL)
- SQL vs. NoSQL: SQL gives ACID transactions, joins, and a mature ecosystem but is harder to scale horizontally; NoSQL scales easily and handles unstructured data but sacrifices joins and often strong consistency.
- Normalization vs. denormalization: Normalized data avoids duplication and simplifies writes but requires expensive joins at read time; denormalized data is fast to read but harder to keep consistent on writes.
- Single leader vs. multi-leader writes: Single leader simplifies consistency but creates a write bottleneck; multi-leader improves write availability but introduces conflict resolution complexity.
Replication
- Synchronous vs. asynchronous replication: Synchronous guarantees no data loss but increases write latency; asynchronous is fast but risks losing recent writes on leader failure.
- Leader-follower vs. leader-leader: Leader-follower is simpler and avoids conflicts but all writes go through one node; leader-leader supports writes anywhere but requires conflict resolution.
- More replicas vs. fewer replicas: More replicas improve read throughput and fault tolerance but increase replication lag and storage cost.
Consistency Models
- Strong vs. eventual consistency: Strong consistency (linearizability) is easiest to reason about but hurts availability and latency; eventual consistency is highly available but clients may see stale data.
- CP vs. AP (CAP theorem): CP systems (e.g., ZooKeeper, HBase) refuse requests during partitions to stay correct; AP systems (e.g., Cassandra, DynamoDB) stay available but may return stale results.
- Consistency vs. latency (PACELC): Even without partitions, there is a trade-off — stronger consistency requires coordination that adds latency.
Message Queues / Async Processing
- Synchronous vs. asynchronous communication: Sync is simpler to reason about and debug but couples services and blocks on slow downstream; async decouples services and absorbs spikes but adds complexity (retries, idempotency, ordering).
- At-most-once vs. at-least-once vs. exactly-once delivery: At-most-once is fastest but may lose messages; at-least-once guarantees delivery but requires idempotent consumers; exactly-once is safest but expensive and hard to achieve.
- Queue (point-to-point) vs. pub/sub (fan-out): Queues ensure each message is processed once by one consumer; pub/sub broadcasts to all subscribers but makes it harder to track processing state.
Communication Protocols
- REST vs. gRPC vs. GraphQL: REST is simple and ubiquitous but can over-fetch/under-fetch; gRPC is fast with strong typing but harder to debug from browsers; GraphQL lets clients request exactly what they need but shifts query complexity to the server.
- HTTP/1.1 vs. HTTP/2 vs. WebSockets: HTTP/1.1 is universal but suffers from head-of-line blocking; HTTP/2 multiplexes streams but adds complexity; WebSockets enable real-time bidirectional communication but require persistent connections.
- Polling vs. long-polling vs. SSE vs. WebSockets: Polling is simple but wastes bandwidth; long-polling reduces waste but ties up server connections; SSE is efficient for server-to-client streams; WebSockets are best for bidirectional real-time but hardest to scale.
Common Patterns
Adapted from Hello Interview — Common Patterns
Many systems need to push updates to users as they happen — chat apps, notifications, live dashboards. Start with HTTP polling until it no longer serves your needs, then consider SSE or WebSockets. On the server side, pub/sub services decouple publishers and subscribers, while stateful servers in a consistent hash ring handle heavier processing.
- Protocol choice: HTTP polling is simplest; SSE is efficient for server-to-client streams; WebSockets enable bidirectional real-time but are hardest to scale.
- Pub/sub (e.g., Redis Pub/Sub, Kafka) decouples publishers from subscribers and is a common server-side backbone for fan-out.
- Stateful servers in a consistent hash ring can handle heavier per-connection processing (e.g., Google Docs collaborative editing).
- Connection management at scale is non-trivial — consider sticky sessions, connection draining, and reconnection strategies.
Operations like video encoding, report generation, or bulk processing take too long for synchronous handling. The pattern splits work into immediate acknowledgment (return a job ID) and background processing via worker pools pulling from a queue. Only use this when tasks are genuinely long-running — synchronous responses simplify architecture dramatically.
- Web server validates the request, pushes a job to a queue (Redis, Kafka, SQS), and returns a job ID within milliseconds.
- Separate worker processes pull jobs from the queue and execute the actual work, enabling independent scaling.
- Track job status so clients can poll for completion. Handle retries and dead letter queues for poison messages.
- Don't reach for async processing prematurely — if the job is short, returning results synchronously gives clearer back-pressure and better UX.
When multiple users access the same resource simultaneously — booking the last ticket, bidding on an auction — you need mechanisms to prevent race conditions. Solutions range from database-level locking to distributed coordination. Start with single-database solutions before scaling to distributed approaches.
- Pessimistic locking (SELECT FOR UPDATE) is simple but reduces throughput; optimistic concurrency control (version checks) allows more parallelism but requires retry logic.
- Distributed locks (e.g., Redlock) and two-phase commit handle cross-service coordination but add complexity.
- Queue-based serialization forces sequential processing of contended resources, trading throughput for correctness.
- Databases are built to solve contention — when you shard data across multiple databases, you inherit all the coordination challenges they were designed to handle.
Read traffic typically grows much faster than writes (often 10:1 to 100:1+). The solution follows a natural progression: optimize within your database (indexing, denormalization), scale horizontally with read replicas, then add external caching layers like Redis and CDNs.
- Start with database indexing and query optimization before adding infrastructure.
- Read replicas distribute load but introduce replication lag — decide what staleness is acceptable.
- Caching (Redis, Memcached) dramatically reduces database load but requires a cache invalidation strategy (TTL vs. event-based).
- CDNs cache static and semi-static content at the edge, reducing latency and origin load.
- Hot keys (millions of users requesting the same popular content) can overwhelm a single cache node — consider replication or request coalescing.
When individual database servers hit write limits, you need horizontal sharding, batching, and intelligent load management. Selecting good partition keys that distribute load evenly while keeping related data together is the central challenge.
- Horizontal sharding distributes data across servers; vertical partitioning separates different data types — both reduce per-node load.
- Partition key selection is critical: good keys distribute evenly and keep related data co-located; bad keys create hot partitions.
- Write queues buffer temporary spikes; load shedding prioritizes important writes during overload.
- Batching groups multiple writes together to reduce per-operation overhead (e.g., Kafka producer batching, database bulk inserts).
Large files (videos, images, documents) need special handling. Instead of routing gigabytes through your application servers, use presigned URLs for direct client-to-storage transfers and CDN delivery. Your app server generates temporary, scoped credentials — the client uploads/downloads directly.
- Presigned URLs let clients upload directly to blob storage (S3) without proxying through your servers, eliminating a major bottleneck.
- Downloads come from CDNs with signed URLs for access control, providing global distribution and low latency.
- State synchronization between your database metadata and blob storage is a key challenge — use storage event notifications to stay consistent.
- Support resumable uploads and progress tracking for large files; handle upload failures gracefully.
Complex workflows like order fulfillment, user onboarding, or payment processing involve multiple services and must survive failures. Solutions range from simple orchestration to workflow engines (Temporal, AWS Step Functions) that handle state management, failure recovery, and retry logic automatically.
- Orchestration (central coordinator) is simpler to reason about; choreography (event-driven) is more decoupled but harder to debug.
- Event sourcing lets each step emit events that trigger subsequent steps, providing a natural audit trail.
- Workflow engines (Temporal, Step Functions) handle state persistence, retries, and exactly-once execution out of the box.
- The key insight: move from scattered state management and manual error handling to declarative workflow definitions.
Systems like ride-sharing or local delivery need to search for entities by location. Geospatial indexes (PostGIS, Redis geo, Elasticsearch) efficiently query by proximity. Only use dedicated geo indexes when indexing hundreds of thousands or millions of items — for smaller datasets, a simple scan is fine.
- PostgreSQL with PostGIS, Redis geospatial commands, or Elasticsearch geo-queries are the standard tools.
- Divide the geographic area into regions and index entities within them to reduce the search space.
- Most systems don't require global queries — users typically search for entities local to them.
- For fewer than ~1,000 items, scanning all items is simpler and faster than maintaining a purpose-built geo index.
Back-of-the-Envelope Cheat Sheet
Common Anti-Patterns
Discuss what happens when things go wrong: server crashes, network partitions, cache misses, thundering herds. Show you think about reliability.
The schema and access patterns drive your entire architecture. Define tables, indexes, and query patterns early.
Always start simple and iterate. Show the interviewer your thought process by evolving the design from a single-box architecture to a distributed system.
System design interviews are collaborative. Pause regularly to ask the interviewer if they'd like you to go deeper or move on.
Designing without understanding requirements leads to solving the wrong problem. Spend the first 5-10 minutes scoping.
Don't add sharding, microservices, or Kafka on day one. Start with the simplest thing that works and scale incrementally.
Many candidates spend several minutes calculating storage, DAU, and QPS at the start of the interview only to conclude "it's a lot" — gaining nothing that influences the design. Back-of-the-envelope math is valuable when it directly shapes a decision (e.g., estimating topic count to decide if a data structure fits on one node, or calculating write throughput to decide if you need sharding). Skip the upfront ritual and do math inline when it actually matters. Tell the interviewer you'll estimate as needed during the design.