Feb 1, 2026

Intermediate 25 min

Adding a Queue

Now we add an actual message queue between the API and the worker.

New Architecture

Before (Direct Call):

Client → API → Worker → DB → Response

After (Queued):

Client → API → Queue → Worker → DB → External Services

The API becomes “fire-and-forget” relative to the worker. It accepts requests and puts them in a queue. The worker processes them asynchronously.

Why Add Queues?

Smooth spikes:

If 200 requests arrive in one second, the queue absorbs them
Workers process at steady rate
System doesn’t crash

Avoid losing requests:

If worker restarts, requests stay in queue
No requests lost during restarts
Better reliability

Decouple components:

API doesn’t wait for worker
API responds quickly
Worker processes at its own pace

What Changes

API behavior:

Before: API waits for worker to finish
After: API puts request in queue and responds immediately

User experience:

Before: User waits for full processing
After: User gets quick acknowledgment, processing happens in background

End-to-end time:

Users still care about total time
But the shape is different: fast acknowledgment, then background processing

Interactive Architecture Toggle

Switch between architectures and see the difference:

Architecture Mode

Producer and Consumer

Now we have two rates to think about:

Producer rate: Requests per second into the queue (from API) Consumer rate: Jobs per second a worker can process

If producer rate > consumer rate, the queue grows.

Interactive Producer/Consumer Simulator

Adjust the rates and watch the queue:

Producer / Consumer Rates

Requests per second into queue: 10

Jobs per second a worker can process: 5

Queue Length

Status

Healthy

Queue Length Over Time

New Trade-offs

Higher peak throughput possible:

Queue absorbs spikes
Workers process steadily
Can handle bursts

But if consumers can’t keep up:

Queue grows without bound
Requests wait longer
Memory usage increases
System may crash

You need backpressure:

Stop accepting more work when queue is too long
Reject requests when overloaded
We’ll cover this next

Key Takeaways

Queues smooth spikes by absorbing bursts
API becomes fire-and-forget - responds quickly
Workers process at steady rate - decoupled from API
Queue grows if producer rate > consumer rate
Need backpressure to prevent unbounded growth

What’s Next?

In the next page, we’ll add backpressure and load shedding. You’ll learn how to prevent queues from growing without bound and how to gracefully handle overload.

Progress 57%

Page 4 of 7

← Previous → Next

Sign In

Architecture Mode

Producer / Consumer Rates

Queue Length Over Time