Designing Backpressure-First Systems: Surviving Traffic Spikes Without Falling Over
Your service works fine in tests. Then a promo email goes out. Or a regional outage hits. Suddenly, everything breaks.
Requests pile up. Queues fill. Databases lock. The whole system collapses.
This isn’t a scaling problem. It’s a backpressure problem. Your system doesn’t know how to say “slow down” when it’s overloaded.
This article shows how to design systems that slow down gracefully instead of crashing. Not theory. Practical patterns you can use today.
Why Backpressure Matters in Modern Systems
Here’s what happens when backpressure is missing.
A payment service handles 100 requests per second normally. A flash sale starts. Traffic jumps to 500 requests per second.
Without backpressure:
- The API accepts all 500 requests
- The queue fills up
- Workers can’t keep up
- The database gets overwhelmed
- Response times spike to 30 seconds
- Clients retry, making it worse
- The database locks up
- Everything stops
With backpressure:
- The API accepts 200 requests per second
- It rejects the rest with 429 (Too Many Requests)
- Clients back off
- The queue stays manageable
- Response times stay under 2 seconds
- The system keeps working
The difference is saying “no” early instead of accepting everything and breaking.
The Failure Chain
Backpressure failures cascade through your system.
Client → Gateway: Clients send requests. No rate limiting. All requests hit the gateway.
Gateway → API: Gateway accepts everything. No concurrency limits. All requests hit the API.
API → Async Workers: API queues everything. Unbounded queue. Workers can’t keep up.
Workers → Database: Workers hit the database. No connection pooling limits. Database gets overwhelmed.
Database → External APIs: Database calls external services. No timeouts. Calls hang forever.
Each layer passes the problem downstream. By the time you notice, it’s too late.
Scaling Up vs Pushing Back
There are two ways to handle load: scale up or push back.
Scaling up: Add more servers. Add more workers. Add more database connections. This works until it doesn’t. And it’s expensive.
Pushing back: Reject excess requests. Slow down processing. Tell clients to wait. This keeps the system stable. And it’s free.
You need both. But most teams only do scaling. They forget about pushing back.
Where Backpressure Lives
Backpressure happens at every layer.
Client: Rate limiting. Exponential backoff. Request queuing.
API Layer: Concurrency limits. Request timeouts. Queue size limits.
Queues: Bounded queues. Priority queues. Dead letter queues.
Worker Pools: Fixed concurrency. Worker health checks. Graceful shutdown.
Database: Connection pooling. Query timeouts. Read replicas.
External Services: Circuit breakers. Timeouts. Fallbacks.
Each layer needs its own backpressure mechanism. They work together.
Mental Model: Flow vs Capacity
Think of your system as a pipeline.
Flow: How fast requests arrive. Requests per second.
Capacity: How fast you can process. Requests per second you can handle.
Buffer: How many requests you can hold. Queue size.
When flow > capacity, the buffer fills. When the buffer is full, you need backpressure.
Little’s Law in Plain Language
Little’s Law says: average queue size = arrival rate × average wait time.
If requests arrive at 100/second and each takes 0.1 seconds, you need capacity for 10 requests. If you have capacity for 5, the queue grows.
If the queue grows faster than you can drain it, you’re in trouble.
Healthy: Arrival rate: 100/sec. Processing rate: 120/sec. Queue size: stable.
Overloaded: Arrival rate: 500/sec. Processing rate: 120/sec. Queue size: growing. Response time: increasing.
When arrival rate > processing rate, you need backpressure. Otherwise, the queue grows forever.
Visualizing Healthy vs Overloaded
Healthy system:
Requests → [Queue: 10] → [Workers: 5] → Database
(stable) (busy) (normal)
Queue stays small. Workers stay busy. Database handles load.
Overloaded system:
Requests → [Queue: 1000] → [Workers: 5] → Database
(growing!) (overwhelmed) (locked)
Queue grows. Workers can’t keep up. Database locks up.
The fix: reject requests before the queue fills.
Backpressure Patterns at the Edge
The edge is where requests enter your system. This is where you apply the first backpressure.
Per-Client Rate Limits
Limit how fast each client can send requests.
Token bucket: Give each client a bucket of tokens. Each request costs one token. Tokens refill at a fixed rate.
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private capacity: number,
private refillRate: number // tokens per second
) {
this.tokens = capacity;
this.lastRefill = Date.now();
}
tryConsume(): boolean {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return true;
}
return false;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
const tokensToAdd = elapsed * this.refillRate;
this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
this.lastRefill = now;
}
getWaitTime(): number {
this.refill();
if (this.tokens >= 1) return 0;
const tokensNeeded = 1 - this.tokens;
return Math.ceil(tokensNeeded / this.refillRate * 1000);
}
}
Leaky bucket: Requests flow into a bucket. The bucket leaks at a fixed rate. If the bucket is full, reject the request.
class LeakyBucket {
private queue: number[] = [];
constructor(
private capacity: number,
private leakRate: number // requests per second
) {}
tryAdd(): boolean {
this.leak();
if (this.queue.length < this.capacity) {
this.queue.push(Date.now());
return true;
}
return false;
}
private leak() {
const now = Date.now();
const interval = 1000 / this.leakRate;
while (this.queue.length > 0 && now - this.queue[0] >= interval) {
this.queue.shift();
}
}
getWaitTime(): number {
this.leak();
if (this.queue.length < this.capacity) return 0;
const oldest = this.queue[0];
const interval = 1000 / this.leakRate;
return Math.ceil(interval - (Date.now() - oldest));
}
}
When to use which:
- Token bucket: Smooth rate limiting. Good for APIs.
- Leaky bucket: Strict rate limiting. Good for strict quotas.
Global Concurrency Caps
Limit how many requests you process at once per endpoint.
class ConcurrencyLimiter {
private active = 0;
private queue: Array<() => void> = [];
constructor(private maxConcurrency: number) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.active < this.maxConcurrency) {
this.active++;
try {
return await fn();
} finally {
this.active--;
this.processQueue();
}
}
return new Promise((resolve, reject) => {
this.queue.push(async () => {
this.active++;
try {
const result = await fn();
resolve(result);
} catch (error) {
reject(error);
} finally {
this.active--;
this.processQueue();
}
});
});
}
private processQueue() {
if (this.queue.length > 0 && this.active < this.maxConcurrency) {
const next = this.queue.shift();
if (next) next();
}
}
}
If concurrency is at max, queue the request. When a request finishes, process the next one.
Connection Timeouts and Deadlines
Set timeouts at every layer.
HTTP timeout:
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);
try {
const response = await fetch(url, { signal: controller.signal });
clearTimeout(timeoutId);
return response;
} catch (error) {
if (error.name === 'AbortError') {
throw new Error('Request timeout');
}
throw error;
}
Request context:
class RequestContext {
private deadline: number;
private cancelled = false;
constructor(timeoutMs: number) {
this.deadline = Date.now() + timeoutMs;
}
isExpired(): boolean {
return Date.now() >= this.deadline || this.cancelled;
}
cancel() {
this.cancelled = true;
}
getRemainingTime(): number {
return Math.max(0, this.deadline - Date.now());
}
}
Propagate the context through your call stack. Check it before expensive operations.
HTTP Status Codes and Retry-After
Tell clients when to retry.
429 Too Many Requests:
if (rateLimiter.isRateLimited(clientId)) {
const waitTime = rateLimiter.getWaitTime(clientId);
res.status(429)
.setHeader('Retry-After', Math.ceil(waitTime / 1000))
.json({ error: 'Rate limit exceeded', retryAfter: waitTime });
return;
}
503 Service Unavailable:
if (queue.isFull()) {
res.status(503)
.setHeader('Retry-After', '10')
.json({ error: 'Service overloaded', retryAfter: 10 });
return;
}
Clients should respect Retry-After. Don’t retry immediately.
Backpressure Inside Your Services
Edge backpressure isn’t enough. You need backpressure inside your services too.
Bounded Worker Pools and Job Queues
Never use unbounded queues. Always set a max size.
class BoundedQueue<T> {
private queue: T[] = [];
constructor(private maxSize: number) {}
enqueue(item: T): boolean {
if (this.queue.length >= this.maxSize) {
return false;
}
this.queue.push(item);
return true;
}
dequeue(): T | undefined {
return this.queue.shift();
}
get size() {
return this.queue.length;
}
get isFull() {
return this.queue.length >= this.maxSize;
}
}
Worker pool:
class WorkerPool {
private workers: Worker[] = [];
private queue: BoundedQueue<Job>;
constructor(
private numWorkers: number,
queueSize: number
) {
this.queue = new BoundedQueue(queueSize);
this.startWorkers();
}
private startWorkers() {
for (let i = 0; i < this.numWorkers; i++) {
const worker = new Worker(this.processJob.bind(this));
this.workers.push(worker);
}
}
async submit(job: Job): Promise<void> {
if (!this.queue.enqueue(job)) {
throw new Error('Queue full');
}
}
private async processJob() {
while (true) {
const job = this.queue.dequeue();
if (!job) {
await sleep(100);
continue;
}
try {
await job.execute();
} catch (error) {
await job.handleError(error);
}
}
}
}
Fixed workers. Bounded queue. If the queue is full, reject new jobs.
Dropping Low-Priority Work
When overloaded, drop less important work first.
class PriorityQueue {
private queues: Map<number, Job[]> = new Map();
enqueue(job: Job, priority: number) {
if (!this.queues.has(priority)) {
this.queues.set(priority, []);
}
this.queues.get(priority)!.push(job);
}
dequeue(): Job | undefined {
const priorities = Array.from(this.queues.keys()).sort((a, b) => b - a);
for (const priority of priorities) {
const queue = this.queues.get(priority);
if (queue && queue.length > 0) {
return queue.shift();
}
}
return undefined;
}
dropLowPriority(): number {
const lowPriority = Math.min(...Array.from(this.queues.keys()));
const queue = this.queues.get(lowPriority);
if (queue) {
const dropped = queue.length;
queue.length = 0;
return dropped;
}
return 0;
}
}
When the queue is full, drop low-priority jobs. Keep high-priority work.
Circuit Breakers
Stop calling slow dependencies.
enum CircuitState {
CLOSED,
OPEN,
HALF_OPEN
}
class CircuitBreaker {
private state = CircuitState.CLOSED;
private failures = 0;
private lastFailureTime = 0;
constructor(
private failureThreshold: number,
private timeout: number,
private halfOpenMaxCalls: number
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === CircuitState.OPEN) {
if (Date.now() - this.lastFailureTime > this.timeout) {
this.state = CircuitState.HALF_OPEN;
this.failures = 0;
} else {
throw new Error('Circuit breaker is open');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
if (this.state === CircuitState.HALF_OPEN) {
this.state = CircuitState.CLOSED;
}
}
private onFailure() {
this.failures++;
this.lastFailureTime = Date.now();
if (this.state === CircuitState.HALF_OPEN) {
this.state = CircuitState.OPEN;
} else if (this.failures >= this.failureThreshold) {
this.state = CircuitState.OPEN;
}
}
getState(): CircuitState {
return this.state;
}
}
After N failures, open the circuit. Stop calling the dependency. After timeout, try half-open. If it works, close it. If not, open it again.
Choosing Safe Queue Sizes and Worker Counts
Queue size: Rule of thumb: 2-3x your worker capacity.
If you have 10 workers processing 10 req/sec each, you can handle 100 req/sec. Set queue size to 200-300. This gives you a 2-3 second buffer.
Too small: Queue fills quickly. Lots of rejections. Too large: Requests wait too long. Stale responses.
Worker count: Rule of thumb: CPU cores × 2 for I/O-bound work. CPU cores for CPU-bound work.
If you have 4 CPU cores and work is I/O-bound (database, APIs), use 8 workers. If work is CPU-bound (image processing), use 4 workers.
Monitor queue depth. If it’s always growing, add workers. If it’s always empty, you have too many.
Protecting Databases and External APIs
Databases are usually the bottleneck. External APIs are unpredictable. Both need protection.
Why Databases Are Usually the Bottleneck
Databases have limited connections. Limited CPU. Limited I/O.
When you overload a database:
- Connections exhaust
- Queries queue
- Locks accumulate
- Everything slows down
You can’t scale a database like you scale application servers. You need to protect it.
Query Timeouts and Cancellation
Set timeouts on every query.
async function queryWithTimeout<T>(
query: string,
params: any[],
timeoutMs: number
): Promise<T> {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
try {
const result = await db.query(query, params, { signal: controller.signal });
clearTimeout(timeoutId);
return result;
} catch (error) {
if (error.name === 'AbortError') {
throw new Error('Query timeout');
}
throw error;
}
}
Cancel long-running queries. Don’t let them hang forever.
Bulkheads: Separate Pools for Different Traffic
Don’t let one traffic class kill everything else.
class Bulkhead {
private pools: Map<string, ConnectionPool> = new Map();
getPool(trafficClass: string): ConnectionPool {
if (!this.pools.has(trafficClass)) {
this.pools.set(trafficClass, new ConnectionPool({
maxConnections: this.getMaxConnections(trafficClass)
}));
}
return this.pools.get(trafficClass)!;
}
private getMaxConnections(trafficClass: string): number {
switch (trafficClass) {
case 'critical': return 20;
case 'normal': return 10;
case 'best-effort': return 5;
default: return 5;
}
}
}
Critical traffic gets its own pool. If best-effort traffic overwhelms its pool, it doesn’t affect critical traffic.
Throttling Outbound Calls
Limit how fast you call external APIs.
class OutboundThrottle {
private lastCallTime = 0;
private minInterval: number;
constructor(callsPerSecond: number) {
this.minInterval = 1000 / callsPerSecond;
}
async throttle<T>(fn: () => Promise<T>): Promise<T> {
const now = Date.now();
const elapsed = now - this.lastCallTime;
const waitTime = Math.max(0, this.minInterval - elapsed);
if (waitTime > 0) {
await sleep(waitTime);
}
this.lastCallTime = Date.now();
return fn();
}
}
Don’t hammer external APIs. Respect their rate limits. Your system depends on them.
End-to-End Design Example
Let’s walk through a “Place Order” flow with backpressure at every layer.
Client Retry Rules
class OrderClient {
async placeOrder(order: Order): Promise<OrderResult> {
const maxRetries = 3;
let attempt = 0;
while (attempt < maxRetries) {
try {
const response = await fetch('/api/orders', {
method: 'POST',
body: JSON.stringify(order),
signal: AbortSignal.timeout(5000)
});
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '1');
await sleep(retryAfter * 1000);
attempt++;
continue;
}
if (response.status === 503) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '10');
await sleep(retryAfter * 1000);
attempt++;
continue;
}
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return await response.json();
} catch (error) {
if (error.name === 'AbortError') {
attempt++;
if (attempt >= maxRetries) throw error;
await sleep(Math.pow(2, attempt) * 1000); // Exponential backoff
continue;
}
throw error;
}
}
throw new Error('Max retries exceeded');
}
}
Respect 429 and 503. Use exponential backoff. Don’t retry forever.
API Concurrency and Queue Limits
const orderLimiter = new ConcurrencyLimiter(50);
const orderQueue = new BoundedQueue<Order>(200);
app.post('/api/orders', async (req, res) => {
const order = req.body;
// Check rate limit
if (!rateLimiter.allow(req.userId)) {
const waitTime = rateLimiter.getWaitTime(req.userId);
return res.status(429)
.setHeader('Retry-After', Math.ceil(waitTime / 1000))
.json({ error: 'Rate limit exceeded' });
}
// Check queue capacity
if (orderQueue.isFull()) {
return res.status(503)
.setHeader('Retry-After', '10')
.json({ error: 'Service overloaded' });
}
// Enqueue with concurrency limit
try {
await orderLimiter.execute(async () => {
if (!orderQueue.enqueue(order)) {
throw new Error('Queue full');
}
});
res.status(202).json({ message: 'Order queued', orderId: order.id });
} catch (error) {
res.status(503).json({ error: 'Service overloaded' });
}
});
Rate limit. Check queue. Enqueue with concurrency limit. Reject if overloaded.
Worker Pool Logic
const workerPool = new WorkerPool(10, 200);
class OrderWorker {
async processOrder(order: Order) {
const context = new RequestContext(30000); // 30 second deadline
try {
// Validate order
await this.validateOrder(order, context);
// Check inventory
const inventory = await this.checkInventory(order.items, context);
if (!inventory.available) {
throw new Error('Out of stock');
}
// Process payment
const payment = await this.processPayment(order, context);
if (!payment.success) {
throw new Error('Payment failed');
}
// Create order
const orderRecord = await this.createOrder(order, payment, context);
// Send confirmation
await this.sendConfirmation(orderRecord, context);
return orderRecord;
} catch (error) {
if (context.isExpired()) {
throw new Error('Request timeout');
}
throw error;
}
}
private async validateOrder(order: Order, context: RequestContext) {
if (context.isExpired()) throw new Error('Timeout');
// Validation logic
}
private async checkInventory(items: Item[], context: RequestContext) {
if (context.isExpired()) throw new Error('Timeout');
return await db.queryWithTimeout(
'SELECT * FROM inventory WHERE item_id IN (...)',
[items.map(i => i.id)],
context.getRemainingTime()
);
}
private async processPayment(order: Order, context: RequestContext) {
if (context.isExpired()) throw new Error('Timeout');
return await paymentCircuitBreaker.execute(async () => {
return await paymentService.charge(order, {
timeout: context.getRemainingTime()
});
});
}
private async createOrder(order: Order, payment: Payment, context: RequestContext) {
if (context.isExpired()) throw new Error('Timeout');
return await db.queryWithTimeout(
'INSERT INTO orders ...',
[order, payment],
context.getRemainingTime()
);
}
private async sendConfirmation(order: Order, context: RequestContext) {
if (context.isExpired()) return; // Best effort, don't fail
try {
await emailService.send({
to: order.email,
subject: 'Order confirmed',
body: `Your order ${order.id} is confirmed`
}, {
timeout: context.getRemainingTime()
});
} catch (error) {
// Log but don't fail
console.error('Failed to send confirmation', error);
}
}
}
Fixed workers. Bounded queue. Timeouts everywhere. Circuit breakers for external calls.
Database Timeout and Retry Rules
class DatabasePool {
private pool: Connection[];
private maxConnections: number;
constructor(maxConnections: number) {
this.maxConnections = maxConnections;
this.pool = [];
}
async getConnection(timeoutMs: number): Promise<Connection> {
const start = Date.now();
while (this.pool.length === 0 && Date.now() - start < timeoutMs) {
await sleep(10);
}
if (this.pool.length === 0) {
throw new Error('Connection pool exhausted');
}
return this.pool.pop()!;
}
async query<T>(sql: string, params: any[], timeoutMs: number): Promise<T> {
const connection = await this.getConnection(timeoutMs);
try {
return await connection.queryWithTimeout(sql, params, timeoutMs);
} finally {
this.pool.push(connection);
}
}
}
Connection pool with timeout. Query timeout. Return connections to pool.
System Behavior Under Load
Normal load (100 req/sec):
- Queue depth: 10-20
- Response time: 200ms
- Database connections: 20/50
- Rejections: 0
Sudden 5x spike (500 req/sec):
- Queue depth: 150-200 (hits limit)
- Response time: 500ms
- Database connections: 45/50
- Rejections: 300 req/sec (429/503)
- System stays stable
Downstream outage (payment service down):
- Circuit breaker opens after 5 failures
- Payment calls fail fast
- Orders queue but don’t process
- Queue depth grows
- After 10 seconds, queue full, new orders rejected with 503
- System stays stable, doesn’t crash
Backpressure at every layer keeps the system stable.
Observability for Backpressure
You can’t manage what you can’t measure. Track backpressure metrics.
What to Measure
Queue depth: How many items are waiting. Track min, max, avg, p95, p99.
class QueueMetrics {
private depths: number[] = [];
recordDepth(depth: number) {
this.depths.push(depth);
if (this.depths.length > 1000) {
this.depths.shift();
}
}
getStats() {
const sorted = [...this.depths].sort((a, b) => a - b);
return {
min: sorted[0],
max: sorted[sorted.length - 1],
avg: sorted.reduce((a, b) => a + b, 0) / sorted.length,
p95: sorted[Math.floor(sorted.length * 0.95)],
p99: sorted[Math.floor(sorted.length * 0.99)]
};
}
}
In-flight requests: How many requests are being processed right now.
class InFlightTracker {
private count = 0;
start() {
this.count++;
}
finish() {
this.count--;
}
getCount() {
return this.count;
}
}
Latency percentiles: Track p50, p95, p99 latency per dependency.
class LatencyTracker {
private latencies: Map<string, number[]> = new Map();
record(dependency: string, latency: number) {
if (!this.latencies.has(dependency)) {
this.latencies.set(dependency, []);
}
this.latencies.get(dependency)!.push(latency);
}
getPercentiles(dependency: string) {
const latencies = this.latencies.get(dependency) || [];
const sorted = [...latencies].sort((a, b) => a - b);
return {
p50: sorted[Math.floor(sorted.length * 0.5)],
p95: sorted[Math.floor(sorted.length * 0.95)],
p99: sorted[Math.floor(sorted.length * 0.99)]
};
}
}
Rejection counts: Track why requests are rejected.
class RejectionTracker {
private counts: Map<string, number> = new Map();
record(reason: string) {
this.counts.set(reason, (this.counts.get(reason) || 0) + 1);
}
getCounts() {
return Object.fromEntries(this.counts);
}
}
Reasons: rate_limit, queue_full, timeout, circuit_open, connection_pool_exhausted.
How to Alert Without Noise
Don’t alert on every spike. Alert on sustained problems.
Queue growth: Alert if queue depth > threshold for 5 minutes.
class QueueAlert {
private violations = 0;
check(depth: number, threshold: number) {
if (depth > threshold) {
this.violations++;
if (this.violations >= 5) { // 5 consecutive checks
this.alert('Queue depth exceeded threshold');
}
} else {
this.violations = 0;
}
}
}
Rising latency: Alert if p95 latency increases by 2x for 5 minutes.
class LatencyAlert {
private baseline: number | null = null;
private violations = 0;
check(p95Latency: number) {
if (this.baseline === null) {
this.baseline = p95Latency;
return;
}
if (p95Latency > this.baseline * 2) {
this.violations++;
if (this.violations >= 5) {
this.alert('Latency increased significantly');
}
} else {
this.violations = 0;
this.baseline = p95Latency; // Update baseline
}
}
}
Rejection rate: Alert if rejection rate > 10% for 5 minutes.
class RejectionAlert {
private rejections = 0;
private total = 0;
record(rejected: boolean) {
this.total++;
if (rejected) this.rejections++;
}
check() {
const rate = this.rejections / this.total;
if (rate > 0.1 && this.total > 100) {
this.alert(`Rejection rate: ${(rate * 100).toFixed(1)}%`);
}
// Reset every 5 minutes
this.rejections = 0;
this.total = 0;
}
}
Alert on sustained problems, not transient spikes.
Practical Checklist
Use this checklist when designing backpressure.
Client:
- Rate limiting with exponential backoff
- Respect 429 and 503 status codes
- Respect Retry-After headers
- Request timeouts
- Request queuing for non-critical requests
API Layer:
- Per-client rate limits (token bucket or leaky bucket)
- Global concurrency limits per endpoint
- Request timeouts and deadlines
- Bounded request queues
- Proper HTTP status codes (429, 503)
Queues:
- Bounded queue size (2-3x worker capacity)
- Priority queues for different traffic classes
- Dead letter queues for failed jobs
- Queue depth monitoring
Worker Pools:
- Fixed worker count (CPU cores × 2 for I/O-bound)
- Worker health checks
- Graceful shutdown
- In-flight request tracking
Database:
- Connection pooling with limits
- Query timeouts
- Query cancellation
- Bulkheads for different traffic classes
- Read replicas for read-heavy workloads
External APIs:
- Circuit breakers
- Request timeouts
- Rate limiting/throttling
- Fallback responses
- Retry with exponential backoff
Observability:
- Queue depth metrics
- In-flight request counts
- Latency percentiles per dependency
- Rejection counts by reason
- Alerts on sustained problems
Testing:
- Load testing with traffic spikes
- Chaos testing with downstream failures
- Verify backpressure works at each layer
- Verify system recovers after overload
If you can check all these, your system will handle traffic spikes gracefully.
Conclusion
Backpressure isn’t optional. It’s essential.
When traffic spikes or downstream services fail, systems without backpressure crash. Systems with backpressure slow down gracefully.
The patterns are simple:
- Rate limit at the edge
- Bound your queues
- Limit your concurrency
- Timeout everything
- Use circuit breakers
- Monitor and alert
But most teams skip them. They focus on scaling up. They forget about pushing back.
Don’t make that mistake. Design backpressure from the start. Your system will thank you.
Discussion
Loading comments...