Intermediate 25 min

What We Covered

We used a simple ride-hailing service to explore:

  • Requests, throughput, latency, and concurrency - how they relate
  • Bottlenecks - how they appear as load grows
  • Queues - why they exist and when they help
  • Backpressure - how to prevent overload
  • Scaling - vertical vs horizontal, and their limits

Key Concepts Recap

Latency: Time for one request to complete. Lower is better.

Throughput: Requests handled per second. Higher is better.

Concurrency: Requests in flight simultaneously. Grows with latency × throughput.

Queues: Form when requests arrive faster than capacity. Can help smooth spikes, but can also grow unbounded.

Backpressure: Reject or throttle requests when overloaded. Better than crashing.

Scaling: Vertical (stronger machine) or horizontal (more machines). Each has limits.

Practical Checklist

Use this checklist for your own systems:

1. Can I explain my system as requests, capacity, and queues?

Questions to ask:

  • What are the requests? (API calls, jobs, events)
  • What’s the capacity? (requests per second per instance)
  • Where do queues form? (in-process, message queue, database)

If you can’t answer these, start here. Map out the request flow and identify capacity limits.

2. What is my real bottleneck today?

Questions to ask:

  • Where is time spent? (CPU, database, network, external services)
  • What limits throughput? (worker capacity, database writes, API rate limits)
  • How do I measure it? (monitoring, profiling, load testing)

Identify the bottleneck before scaling. Scaling the wrong thing wastes money and doesn’t help.

3. What happens during a traffic spike?

Questions to ask:

  • How does latency change? (stays flat, grows linearly, explodes)
  • Do queues form? (where, how fast do they grow)
  • Do requests timeout? (at what load, what percentage)

Test this. Run load tests. See what happens at 2x, 5x, 10x normal load.

4. Do I have any backpressure today?

Questions to ask:

  • Do I reject requests when overloaded? (429, 503 errors)
  • Do I throttle requests? (rate limiting)
  • What happens when queue is full? (reject, wait, crash)

If no, add it. Backpressure prevents crashes and keeps systems responsive.

5. How would I scale this if traffic 5×?

Questions to ask:

  • Can I scale vertically? (stronger machine, how much)
  • Can I scale horizontally? (more instances, load balancing)
  • What becomes the bottleneck? (database, external services, coordination)

Plan ahead. Know your scaling strategy before you need it.

Knowledge Check

Test your understanding with a quick quiz:

Next Steps

Apply this to your own system:

  1. Map your request flow - Draw it out. Where does time go?
  2. Measure capacity - What’s your actual throughput? Where are the limits?
  3. Test under load - What happens at 2x, 5x, 10x normal load?
  4. Add backpressure - If you don’t have it, add it. Start simple.
  5. Plan scaling - Know your strategy before you need it.

Learn more:

  • Load testing tools: k6, Apache JMeter, Locust
  • Monitoring: Prometheus, Datadog, New Relic
  • Queue systems: RabbitMQ, Kafka, AWS SQS
  • Scaling patterns: Read about other companies’ scaling stories

Final Thoughts

System design is about thinking in terms of capacity, load, and bottlenecks.

  • Start simple
  • Measure everything
  • Add complexity only when needed
  • Solve one bottleneck at a time

The ride-hailing example is simple, but the concepts apply everywhere: web APIs, batch jobs, real-time systems, data pipelines.

Remember: Solving one bottleneck just reveals the next one. That’s normal. Keep solving them one at a time.