Intermediate 25 min

From One Request to Many

So far we’ve looked at one request. Now let’s add traffic.

What if 10 users per second request a ride?

What if 200 users per second do it?

Let’s see what happens.

The Intuition

Think of a cashier at a store. One cashier can serve about 5 customers per minute. If 20 customers arrive per minute, a queue forms. The store door still lets people in, but they wait longer.

Same thing happens with our dispatcher. If requests arrive faster than we can process them, they queue up inside the dispatcher process.

Interactive Traffic Simulator

Adjust the traffic slider and watch what happens:

Note: Interactive features are being added. The sliders and visualizations will be fully functional soon.

Traffic Control

Capacity
5 req/s
Queue Length
0
Avg Latency
200ms
Status
Healthy

Queue Visualizer

What Happens as Load Grows

Low load (1-5 req/s):

  • No queue forms
  • Latency stays at ~200ms
  • System is comfortable

Medium load (6-15 req/s):

  • Queue starts forming
  • Latency grows: 200ms + queue wait time
  • System is stressed but functional

High load (16-30 req/s):

  • Queue grows quickly
  • Latency can reach several seconds
  • Some requests may timeout

Very high load (30+ req/s):

  • Queue grows without bound
  • Requests start timing out
  • System may become unresponsive

The Math

If each request takes 200ms:

  • One instance capacity: 5 req/s
  • If 20 requests arrive per second:
    • 5 are processed immediately
    • 15 queue up
    • Queue grows by 15 per second
    • After 10 seconds, 150 requests are queued
    • Each queued request waits: queue position × 200ms

Real-World Analogy

Think of a highway toll booth:

  • One booth can process 5 cars per minute
  • If 20 cars arrive per minute, 15 wait
  • The line grows longer
  • Cars at the back wait much longer

Same with our dispatcher. Requests at the back of the queue wait much longer than requests at the front.

When Requests Time Out

Most clients have timeout settings:

  • Mobile apps: 5-10 seconds
  • Web browsers: 30 seconds
  • API clients: varies

If latency exceeds the timeout, the client gives up. The request might still be processing on the server, but the user sees an error.

Key Takeaways

  • Queues form when requests arrive faster than capacity
  • Latency grows as queue length increases
  • Requests timeout when latency exceeds client timeout
  • Capacity limits determine when queues form
  • If capacity is 5 req/s and 20 arrive, queue grows by 15 per second

What’s Next?

In the next page, we’ll introduce a message queue. This changes the architecture and lets us smooth out traffic spikes. You’ll see how queues can help, but also create new problems.