What We Covered
We used a simple ride-hailing service to explore:
- Requests, throughput, latency, and concurrency - how they relate
- Bottlenecks - how they appear as load grows
- Queues - why they exist and when they help
- Backpressure - how to prevent overload
- Scaling - vertical vs horizontal, and their limits
Key Concepts Recap
Latency: Time for one request to complete. Lower is better.
Throughput: Requests handled per second. Higher is better.
Concurrency: Requests in flight simultaneously. Grows with latency × throughput.
Queues: Form when requests arrive faster than capacity. Can help smooth spikes, but can also grow unbounded.
Backpressure: Reject or throttle requests when overloaded. Better than crashing.
Scaling: Vertical (stronger machine) or horizontal (more machines). Each has limits.
Practical Checklist
Use this checklist for your own systems:
1. Can I explain my system as requests, capacity, and queues?
Questions to ask:
- What are the requests? (API calls, jobs, events)
- What’s the capacity? (requests per second per instance)
- Where do queues form? (in-process, message queue, database)
If you can’t answer these, start here. Map out the request flow and identify capacity limits.
2. What is my real bottleneck today?
Questions to ask:
- Where is time spent? (CPU, database, network, external services)
- What limits throughput? (worker capacity, database writes, API rate limits)
- How do I measure it? (monitoring, profiling, load testing)
Identify the bottleneck before scaling. Scaling the wrong thing wastes money and doesn’t help.
3. What happens during a traffic spike?
Questions to ask:
- How does latency change? (stays flat, grows linearly, explodes)
- Do queues form? (where, how fast do they grow)
- Do requests timeout? (at what load, what percentage)
Test this. Run load tests. See what happens at 2x, 5x, 10x normal load.
4. Do I have any backpressure today?
Questions to ask:
- Do I reject requests when overloaded? (429, 503 errors)
- Do I throttle requests? (rate limiting)
- What happens when queue is full? (reject, wait, crash)
If no, add it. Backpressure prevents crashes and keeps systems responsive.
5. How would I scale this if traffic 5×?
Questions to ask:
- Can I scale vertically? (stronger machine, how much)
- Can I scale horizontally? (more instances, load balancing)
- What becomes the bottleneck? (database, external services, coordination)
Plan ahead. Know your scaling strategy before you need it.
Knowledge Check
Test your understanding with a quick quiz:
Next Steps
Apply this to your own system:
- Map your request flow - Draw it out. Where does time go?
- Measure capacity - What’s your actual throughput? Where are the limits?
- Test under load - What happens at 2x, 5x, 10x normal load?
- Add backpressure - If you don’t have it, add it. Start simple.
- Plan scaling - Know your strategy before you need it.
Learn more:
- Load testing tools: k6, Apache JMeter, Locust
- Monitoring: Prometheus, Datadog, New Relic
- Queue systems: RabbitMQ, Kafka, AWS SQS
- Scaling patterns: Read about other companies’ scaling stories
Final Thoughts
System design is about thinking in terms of capacity, load, and bottlenecks.
- Start simple
- Measure everything
- Add complexity only when needed
- Solve one bottleneck at a time
The ride-hailing example is simple, but the concepts apply everywhere: web APIs, batch jobs, real-time systems, data pipelines.
Remember: Solving one bottleneck just reveals the next one. That’s normal. Keep solving them one at a time.