Adding a Queue
Now we add an actual message queue between the API and the worker.
New Architecture
Before (Direct Call):
Client → API → Worker → DB → Response
After (Queued):
Client → API → Queue → Worker → DB → External Services
The API becomes “fire-and-forget” relative to the worker. It accepts requests and puts them in a queue. The worker processes them asynchronously.
Why Add Queues?
Smooth spikes:
- If 200 requests arrive in one second, the queue absorbs them
- Workers process at steady rate
- System doesn’t crash
Avoid losing requests:
- If worker restarts, requests stay in queue
- No requests lost during restarts
- Better reliability
Decouple components:
- API doesn’t wait for worker
- API responds quickly
- Worker processes at its own pace
What Changes
API behavior:
- Before: API waits for worker to finish
- After: API puts request in queue and responds immediately
User experience:
- Before: User waits for full processing
- After: User gets quick acknowledgment, processing happens in background
End-to-end time:
- Users still care about total time
- But the shape is different: fast acknowledgment, then background processing
Interactive Architecture Toggle
Switch between architectures and see the difference:
Architecture Mode
Producer and Consumer
Now we have two rates to think about:
Producer rate: Requests per second into the queue (from API) Consumer rate: Jobs per second a worker can process
If producer rate > consumer rate, the queue grows.
Interactive Producer/Consumer Simulator
Adjust the rates and watch the queue:
Producer / Consumer Rates
Queue Length Over Time
New Trade-offs
Higher peak throughput possible:
- Queue absorbs spikes
- Workers process steadily
- Can handle bursts
But if consumers can’t keep up:
- Queue grows without bound
- Requests wait longer
- Memory usage increases
- System may crash
You need backpressure:
- Stop accepting more work when queue is too long
- Reject requests when overloaded
- We’ll cover this next
Key Takeaways
- Queues smooth spikes by absorbing bursts
- API becomes fire-and-forget - responds quickly
- Workers process at steady rate - decoupled from API
- Queue grows if producer rate > consumer rate
- Need backpressure to prevent unbounded growth
What’s Next?
In the next page, we’ll add backpressure and load shedding. You’ll learn how to prevent queues from growing without bound and how to gracefully handle overload.