Preapare System Design

dennyzhang

October 3, 2025

URL: https://quantcodedenny.com/posts/prepare-system-interview/

System design interviews test not only technical skills but also strategic thinking, end-to-end ownership, scalability awareness, and cross-team influence.

As a high-end IC6 (between IC6 and IC7), my goal is to bridge execution with strategic design thinking. This blog post consolidates my learnings from system design prep, especially for ML Infra contexts, and is designed to evolve as I add new insights.

System Design Leveling – 35 mins

IC5

Independently chooses components to tell a coherent story.
Covers end-to-end design with no significant gaps.
Discusses tradeoffs and user impact, with prompting.

IC6

Creates effective designs addressing multiple critical aspects.
Anticipates problems, including maintainability and organizational challenges.
Speaks thoroughly on tradeoffs, bottlenecks, and user impact with minimal prompting.

IC7

Builds sophisticated designs addressing all stated and implicit parts of a problem.
Proactively considers alternative solutions, immediate vs. long-term issues, and business needs.
Covers all tradeoffs: technical, user impact, team dynamics, reliability, and sustainability.

IC6 System Design Interview Framework

“Just to confirm, we need a system for X with Y latency and Z throughput, correct?”
“What is the expected scale: daily active users, requests per second, or data volume?”
“Do we need strong consistency, or is eventual consistency sufficient?”
“Are there privacy or compliance constraints we need to consider?”
“Let’s break the problem into three main components: ingestion, processing, serving.”
“I want to highlight dependencies and areas that might require cross-team coordination.”

2. High-Level Solution Design

“At a high level, the system would look like this: [describe layers or components].”
“Data flows from ingestion → processing → storage → serving.”
“Each component is decoupled so that changes in one layer don’t impact others.”
“Let’s deep dive into the processing layer; we could use batch or stream depending on latency requirements.”
“Caching frequently requested data improves latency but introduces invalidation challenges.”
“We should plan for spikes using queues, retries, and circuit breakers.”
“We can segment functionality so individual components can be updated without impacting the system.”

3. Trade-offs & Technical Excellence

“We could optimize for latency, but it would increase operational cost.”
“Sharding improves throughput but complicates cross-partition queries.”
“Eventual consistency reduces latency but requires careful handling of edge cases.”
“Using framework X provides distributed fault tolerance; framework Y has lower latency but higher complexity.”
“Multi-region replication improves availability but introduces latency trade-offs.”
“Subtle risks include version-to-version schema changes; we can handle them via backward-compatible migrations.”

4. Communication & Influence

“To summarize, here’s why I made each decision and the trade-offs involved.”
“From a user perspective, this design ensures low latency for the majority of requests.”
“I want to check if my assumptions about scale and growth align with your expectations.”
“I hear your concern; here’s how I’d adjust the design.”
“Let’s revisit the key bottlenecks and ensure the architecture addresses them.”
“For future extensions, modular components can accommodate new requirements without major changes.”

Common trade-off

Trade-off	What it Means	Key Considerations / Questions to Ask Yourself
Latency vs Throughput	Optimizing for faster responses may reduce total system throughput, and vice versa	How fast must requests complete? Can we batch or async some work? Can we precompute results?
Consistency vs Availability	Strong consistency may slow responses or reduce availability; eventual consistency improves availability but allows stale data	Does the user expect immediate read-your-writes consistency? Which parts of the system can tolerate eventual consistency?
Freshness vs Compute / Cost	Frequent updates improve freshness but increase CPU, I/O, or memory usage	How often do users need updated data? Can some updates be async or cached?
Complexity vs Extensibility	Simple designs are easier to implement, but flexible/modular designs are easier to evolve	Will the system need new features in the future? How can we make it modular without overengineering?
Storage vs Query Performance	Precomputing or denormalizing improves read performance but increases storage cost	Which data should be materialized? Can we compute some things on demand?
Generalization vs Edge-Case Optimization	Optimizing for the common case may hurt edge cases; handling every edge case can increase complexity	What’s the typical user scenario? Are there extreme cases that need special handling?
Observability vs Performance	Metrics, logs, and dashboards aid monitoring but can add latency or storage overhead	What key metrics/SLOs are critical? Can monitoring be async?
Scalability vs Simplicity	Designs that scale to millions/billions often require sharding, async pipelines, and caches, which increase system complexity	What is the expected growth? Can we start simple and evolve, or must it scale from day one?
Security / Privacy vs Usability	Strong security or privacy measures may slow performance or complicate user experience	What are compliance or privacy requirements? How does this affect API design or latency?
Consistency / Correctness vs Cost / Speed	Guaranteeing exact correctness may increase cost or reduce speed	Can approximate results suffice? Which operations require strong guarantees?

Common techniques

Technique	What it does	When to mention
Push + Pull hybrid	Combines fan-out and fan-in based on follower count	If asked about celebrities or skewed followers
Precomputed feed cache	Stores top N posts for a user	Helps meet strict read latency (P95 < 300 ms)
Sharded queues	Each user’s feed is partitioned	To scale for millions of users
Asynchronous write pipelines (Kafka, stream processing)	Fan-out writes done asynchronously	Improves throughput and reduces write blocking
Local re-ranking	Lightweight ranking at Serving layer	Adjust freshness, unseen content, or last-second boosts

Known patterns

Pattern	Description / Purpose	Pros	Cons / Trade-offs	Typical Use Cases
Client-Server	Clients send requests, server responds	Simple, clear separation	Can be bottlenecked at server	Web apps, APIs, mobile backends
Load Balancing / Horizontal Scaling	Distribute requests across multiple servers	High availability, fault tolerance	Complexity in routing, sticky sessions	High-traffic APIs, web services
Caching	Store frequently accessed data	Reduces latency, lowers DB load	Cache invalidation complexity, stale data	DB queries, API responses, CDN content
Sharding / Partitioning	Split data across nodes	Scales reads/writes	Harder joins, uneven load	Large user datasets, multi-tenant DBs
Replication	Maintain multiple copies of data	High availability, disaster recovery	Consistency trade-offs	Multi-region DBs, fault-tolerant systems
Event-Driven / Messaging	Asynchronous communication via messages/events	Decoupled, scalable	Message ordering, duplication issues	Logging, feature pipelines, order processing
Microservices / SOA	Decompose monolith into independent services	Independent deployability, scalable per service	Service communication, data consistency	Large apps, ML pipelines, modular backend
Queueing & Backpressure	Smooth spikes, decouple producer/consumer	Handles high load reliably	Requires monitoring, retry & dead-letter handling	Task queues, ingestion pipelines
Rate Limiting / Throttling	Control request rates	Protects backend resources	Can block valid requests if too aggressive	APIs, microservices
Proxy / Gateway	Intermediary for routing, caching, auth	Centralizes cross-cutting concerns	Single point of failure if not highly available	API gateway, reverse proxy, authentication
Leader Election / Consensus	Distributed coordination, single source of truth	Ensures consistency, coordination	Complexity in distributed systems	Distributed locks, master selection, config
Observability	Logging, metrics, tracing	Easier debugging and monitoring	Adds overhead, requires discipline	ML infra, microservices, pipelines
Circuit Breaker / Retry	Protect services from cascading failures	Increases system resiliency	Misconfigured thresholds can block traffic unnecessarily	Microservices, external APIs
Batch vs. Stream Processing	Process data in chunks vs. continuously	Batch: efficient, stream: low-latency	Batch: latency, Stream: complex error handling	ETL jobs, ML feature store updates, analytics

Core Principles for System Design

1. Problem Structuring

Clarify requirements and constraints (functional & non-functional, SLAs, ownership boundaries, future growth).
Identify critical paths, unknowns, and dependencies.
Use a structured approach to map problem → components → interactions.

2. Trade-Off Awareness

Recognize and quantify trade-offs: latency vs. throughput, consistency vs. availability, cost vs. reliability.
Include business, operational, and cross-team implications.
IC7-level thinking anticipates tangential trade-offs proactively.

3. Known Patterns & Abstractions

Apply reusable patterns: caching, sharding, load balancing, replication, event-driven architectures.
Avoid reinventing solutions; justify deviations clearly.

4. Scalability & Reliability

Horizontal vs. vertical scaling.
Fault tolerance, retries, backpressure handling, recovery strategies.
Monitoring, alerts, and observability planning.
Consider future-proofing and maintainability.

5. Communication & Influence

Present a structured narrative: context → problem → options → trade-offs → recommendation.
Highlight strategic impact, not just technical correctness.
Prepare concise “elevator pitches” for directors, PMs, and cross-functional teams.

IC6 System Design Practice Checklist

Requirements & Clarification

Functional vs. non-functional requirements.
Latency, throughput, SLAs/SLOs.
Ownership boundaries & team responsibilities.
Expected growth & future-proofing needs.

High-Level Design

Identify major components & interactions.
Map data flow (ingestion → processing → serving).
Define APIs, interfaces, and abstractions.
Highlight cross-team dependencies.

Scaling & Reliability

Horizontal vs. vertical scaling.
Caching, sharding, partitioning.
Fault tolerance & retries.
Observability: monitoring, alerting, metrics.

Tradeoffs & Options

Pros/cons of architectural choices.
Cost vs. performance vs. complexity.
Business impact of each option.

Edge Cases / Failure Modes

Identify points of failure and mitigation strategies.
Discuss backpressure, stale data, network issues.
Recovery, retries, fallback logic.

Influence & Communication

Summarize decisions for non-technical stakeholders.
Highlight trade-offs explicitly.
Show strategic impact and maintain leadership presence.

Leadership Presence

Lead calmly and confidently.
Encourage team input while framing final decisions.
Maintain focus on high-leverage improvements.