Intermediate 25 min

The Simplest Version

Let’s start with the simplest possible system:

  • Single dispatcher service
  • Synchronous request handling
  • Everything on one instance

The Request Flow

Here’s what happens when a user requests a ride:

Client → Dispatcher → DB Lookups → Driver Match → Response

Step by step:

  1. Client sends request: User taps “Request Ride” button
  2. Dispatcher receives: API endpoint receives the request
  3. DB lookups: Dispatcher queries database for:
    • Available drivers near the user
    • User’s ride history
    • Pricing rules
  4. Driver match: Dispatcher selects best driver
  5. Response: Dispatcher sends confirmation back to client

Everything happens in sequence. The client waits for the entire process to finish.

Where Time Is Spent

Let’s break down where time goes in a typical request:

Network time: 50ms

  • Client to dispatcher: 25ms
  • Dispatcher to client: 25ms

Dispatcher logic: 50ms

  • Request parsing and validation
  • Business logic execution

Database time: 100ms

  • Query execution
  • Result processing

Total latency: ~200ms

This is a simplified model, but it’s close enough for learning.

Interactive Timeline

Watch a single request flow through the system:

Request (25ms) Query (100ms) Match (50ms) Confirm (25ms) Client Dispatcher Database Driver Match Response

Key Definitions

Latency: Time for one request to complete. If a request takes 200ms from start to finish, latency is 200ms.

Throughput: Requests handled per second. If each request takes 200ms, one instance can finish about 5 requests per second (1000ms ÷ 200ms = 5).

Concurrency: How many requests are “in flight” at once. If requests take 200ms and 10 arrive per second, you’ll have about 2 requests being processed simultaneously on average.

Latency Breakdown

Hover over each stage to see where time is spent:

Latency Breakdown

Network: 50ms

Dispatcher Logic: 50ms

Database: 100ms

The Math

If each request takes 200ms:

  • One instance can process: 1000ms ÷ 200ms = 5 requests per second
  • This is the maximum throughput for one instance

If requests arrive faster than 5 per second, they’ll start queuing. We’ll see that next.

Key Takeaways

  • Latency is the time for one request to complete
  • Throughput is requests handled per second
  • Concurrency is how many requests are in flight
  • If latency is 200ms, max throughput is about 5 req/s per instance
  • Network, dispatcher logic, and database all contribute to latency

What’s Next?

In the next page, we’ll add load and watch bottlenecks appear. You’ll see how queues form, latency grows, and requests start timing out as traffic increases.