Feb 1, 2026

Intermediate 25 min

What We Covered

We used a simple ride-hailing service to explore:

Requests, throughput, latency, and concurrency - how they relate
Bottlenecks - how they appear as load grows
Queues - why they exist and when they help
Backpressure - how to prevent overload
Scaling - vertical vs horizontal, and their limits

Key Concepts Recap

Latency: Time for one request to complete. Lower is better.

Throughput: Requests handled per second. Higher is better.

Concurrency: Requests in flight simultaneously. Grows with latency × throughput.

Queues: Form when requests arrive faster than capacity. Can help smooth spikes, but can also grow unbounded.

Backpressure: Reject or throttle requests when overloaded. Better than crashing.

Scaling: Vertical (stronger machine) or horizontal (more machines). Each has limits.

Practical Checklist

Use this checklist for your own systems:

1. Can I explain my system as requests, capacity, and queues?

Questions to ask:

What are the requests? (API calls, jobs, events)
What’s the capacity? (requests per second per instance)
Where do queues form? (in-process, message queue, database)

If you can’t answer these, start here. Map out the request flow and identify capacity limits.

2. What is my real bottleneck today?

Questions to ask:

Where is time spent? (CPU, database, network, external services)
What limits throughput? (worker capacity, database writes, API rate limits)
How do I measure it? (monitoring, profiling, load testing)

Identify the bottleneck before scaling. Scaling the wrong thing wastes money and doesn’t help.

3. What happens during a traffic spike?

Questions to ask:

How does latency change? (stays flat, grows linearly, explodes)
Do queues form? (where, how fast do they grow)
Do requests timeout? (at what load, what percentage)

Test this. Run load tests. See what happens at 2x, 5x, 10x normal load.

4. Do I have any backpressure today?

Questions to ask:

Do I reject requests when overloaded? (429, 503 errors)
Do I throttle requests? (rate limiting)
What happens when queue is full? (reject, wait, crash)

If no, add it. Backpressure prevents crashes and keeps systems responsive.

5. How would I scale this if traffic 5×?

Questions to ask:

Can I scale vertically? (stronger machine, how much)
Can I scale horizontally? (more instances, load balancing)
What becomes the bottleneck? (database, external services, coordination)

Plan ahead. Know your scaling strategy before you need it.

Knowledge Check

Test your understanding with a quick quiz:

Knowledge Check

This interactive quiz requires JavaScript to be enabled.

Question 1: If a request takes 200ms to process, what's the maximum throughput for one instance?

A. 2 requests per second
B. 5 requests per second (Correct)
C. 10 requests per second
D. 20 requests per second

Explanation: Throughput = 1000ms ÷ 200ms = 5 requests per second. This is the maximum rate one instance can handle.

Question 2: What happens when requests arrive faster than capacity?

A. Requests are automatically rejected
B. A queue forms and latency increases (Correct)
C. Throughput automatically increases
D. Nothing, the system handles it

Explanation: When requests arrive faster than capacity, they queue up. As the queue grows, latency increases because requests wait longer.

Question 3: What is backpressure?

A. Increasing system capacity
B. Slowing down producers when consumers fall behind (Correct)
C. Adding more worker instances
D. Optimizing database queries

Explanation: Backpressure means slowing down or stopping producers (request sources) when consumers (workers) can't keep up. This prevents unbounded queue growth.

Question 4: After scaling workers horizontally, what often becomes the next bottleneck?

A. Worker CPU
B. Database writes (Correct)
C. Network bandwidth
D. Memory

Explanation: After scaling workers, the database often becomes the bottleneck because it's harder to scale and may have write limits. External services can also become bottlenecks.

Next Steps

Apply this to your own system:

Map your request flow - Draw it out. Where does time go?
Measure capacity - What’s your actual throughput? Where are the limits?
Test under load - What happens at 2x, 5x, 10x normal load?
Add backpressure - If you don’t have it, add it. Start simple.
Plan scaling - Know your strategy before you need it.

Learn more:

Load testing tools: k6, Apache JMeter, Locust
Monitoring: Prometheus, Datadog, New Relic
Queue systems: RabbitMQ, Kafka, AWS SQS
Scaling patterns: Read about other companies’ scaling stories

Final Thoughts

System design is about thinking in terms of capacity, load, and bottlenecks.

Start simple
Measure everything
Add complexity only when needed
Solve one bottleneck at a time

The ride-hailing example is simple, but the concepts apply everywhere: web APIs, batch jobs, real-time systems, data pipelines.

Remember: Solving one bottleneck just reveals the next one. That’s normal. Keep solving them one at a time.

← Back to Tutorial Start

Progress 100%

Page 7 of 7

← Previous → Next

Sign In