Dec 2, 2025

By Yusuf Elborey

Designing Backpressure-First Systems: Surviving Traffic Spikes Without Falling Over

backpressuresystem-designreliabilityscalabilityrate-limitingcircuit-breakersqueuesdistributed-systems

View sample code on GitHub https://github.com/appropri8/sample-code/tree/main/2025/12/02/designing-backpressure-first-systems

Your service works fine in tests. Then a promo email goes out. Or a regional outage hits. Suddenly, everything breaks.

Requests pile up. Queues fill. Databases lock. The whole system collapses.

This isn’t a scaling problem. It’s a backpressure problem. Your system doesn’t know how to say “slow down” when it’s overloaded.

This article shows how to design systems that slow down gracefully instead of crashing. Not theory. Practical patterns you can use today.

Why Backpressure Matters in Modern Systems

Here’s what happens when backpressure is missing.

A payment service handles 100 requests per second normally. A flash sale starts. Traffic jumps to 500 requests per second.

Without backpressure:

The API accepts all 500 requests
The queue fills up
Workers can’t keep up
The database gets overwhelmed
Response times spike to 30 seconds
Clients retry, making it worse
The database locks up
Everything stops

With backpressure:

The API accepts 200 requests per second
It rejects the rest with 429 (Too Many Requests)
Clients back off
The queue stays manageable
Response times stay under 2 seconds
The system keeps working

The difference is saying “no” early instead of accepting everything and breaking.

The Failure Chain

Backpressure failures cascade through your system.

Client → Gateway: Clients send requests. No rate limiting. All requests hit the gateway.

Gateway → API: Gateway accepts everything. No concurrency limits. All requests hit the API.

API → Async Workers: API queues everything. Unbounded queue. Workers can’t keep up.

Workers → Database: Workers hit the database. No connection pooling limits. Database gets overwhelmed.

Database → External APIs: Database calls external services. No timeouts. Calls hang forever.

Each layer passes the problem downstream. By the time you notice, it’s too late.

Scaling Up vs Pushing Back

There are two ways to handle load: scale up or push back.

Scaling up: Add more servers. Add more workers. Add more database connections. This works until it doesn’t. And it’s expensive.

Pushing back: Reject excess requests. Slow down processing. Tell clients to wait. This keeps the system stable. And it’s free.

You need both. But most teams only do scaling. They forget about pushing back.

Where Backpressure Lives

Backpressure happens at every layer.

Client: Rate limiting. Exponential backoff. Request queuing.

API Layer: Concurrency limits. Request timeouts. Queue size limits.

Queues: Bounded queues. Priority queues. Dead letter queues.

Worker Pools: Fixed concurrency. Worker health checks. Graceful shutdown.

Database: Connection pooling. Query timeouts. Read replicas.

External Services: Circuit breakers. Timeouts. Fallbacks.

Each layer needs its own backpressure mechanism. They work together.

Mental Model: Flow vs Capacity

Think of your system as a pipeline.

Flow: How fast requests arrive. Requests per second.

Capacity: How fast you can process. Requests per second you can handle.

Buffer: How many requests you can hold. Queue size.

When flow > capacity, the buffer fills. When the buffer is full, you need backpressure.

Little’s Law in Plain Language

Little’s Law says: average queue size = arrival rate × average wait time.

If requests arrive at 100/second and each takes 0.1 seconds, you need capacity for 10 requests. If you have capacity for 5, the queue grows.

If the queue grows faster than you can drain it, you’re in trouble.

Healthy: Arrival rate: 100/sec. Processing rate: 120/sec. Queue size: stable.

Overloaded: Arrival rate: 500/sec. Processing rate: 120/sec. Queue size: growing. Response time: increasing.

When arrival rate > processing rate, you need backpressure. Otherwise, the queue grows forever.

Visualizing Healthy vs Overloaded

Healthy system:

Requests → [Queue: 10] → [Workers: 5] → Database
           (stable)      (busy)         (normal)

Queue stays small. Workers stay busy. Database handles load.

Overloaded system:

Requests → [Queue: 1000] → [Workers: 5] → Database
           (growing!)      (overwhelmed)  (locked)

Queue grows. Workers can’t keep up. Database locks up.

The fix: reject requests before the queue fills.

Backpressure Patterns at the Edge

The edge is where requests enter your system. This is where you apply the first backpressure.

Per-Client Rate Limits

Limit how fast each client can send requests.

Token bucket: Give each client a bucket of tokens. Each request costs one token. Tokens refill at a fixed rate.

class TokenBucket {
  private tokens: number;
  private lastRefill: number;
  
  constructor(
    private capacity: number,
    private refillRate: number // tokens per second
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }
  
  tryConsume(): boolean {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;
    }
    return false;
  }
  
  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const tokensToAdd = elapsed * this.refillRate;
    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }
  
  getWaitTime(): number {
    this.refill();
    if (this.tokens >= 1) return 0;
    const tokensNeeded = 1 - this.tokens;
    return Math.ceil(tokensNeeded / this.refillRate * 1000);
  }
}

Leaky bucket: Requests flow into a bucket. The bucket leaks at a fixed rate. If the bucket is full, reject the request.

class LeakyBucket {
  private queue: number[] = [];
  
  constructor(
    private capacity: number,
    private leakRate: number // requests per second
  ) {}
  
  tryAdd(): boolean {
    this.leak();
    if (this.queue.length < this.capacity) {
      this.queue.push(Date.now());
      return true;
    }
    return false;
  }
  
  private leak() {
    const now = Date.now();
    const interval = 1000 / this.leakRate;
    while (this.queue.length > 0 && now - this.queue[0] >= interval) {
      this.queue.shift();
    }
  }
  
  getWaitTime(): number {
    this.leak();
    if (this.queue.length < this.capacity) return 0;
    const oldest = this.queue[0];
    const interval = 1000 / this.leakRate;
    return Math.ceil(interval - (Date.now() - oldest));
  }
}

When to use which:

Token bucket: Smooth rate limiting. Good for APIs.
Leaky bucket: Strict rate limiting. Good for strict quotas.

Global Concurrency Caps

Limit how many requests you process at once per endpoint.

class ConcurrencyLimiter {
  private active = 0;
  private queue: Array<() => void> = [];
  
  constructor(private maxConcurrency: number) {}
  
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.active < this.maxConcurrency) {
      this.active++;
      try {
        return await fn();
      } finally {
        this.active--;
        this.processQueue();
      }
    }
    
    return new Promise((resolve, reject) => {
      this.queue.push(async () => {
        this.active++;
        try {
          const result = await fn();
          resolve(result);
        } catch (error) {
          reject(error);
        } finally {
          this.active--;
          this.processQueue();
        }
      });
    });
  }
  
  private processQueue() {
    if (this.queue.length > 0 && this.active < this.maxConcurrency) {
      const next = this.queue.shift();
      if (next) next();
    }
  }
}

If concurrency is at max, queue the request. When a request finishes, process the next one.

Connection Timeouts and Deadlines

Set timeouts at every layer.

HTTP timeout:

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);

try {
  const response = await fetch(url, { signal: controller.signal });
  clearTimeout(timeoutId);
  return response;
} catch (error) {
  if (error.name === 'AbortError') {
    throw new Error('Request timeout');
  }
  throw error;
}

Request context:

class RequestContext {
  private deadline: number;
  private cancelled = false;
  
  constructor(timeoutMs: number) {
    this.deadline = Date.now() + timeoutMs;
  }
  
  isExpired(): boolean {
    return Date.now() >= this.deadline || this.cancelled;
  }
  
  cancel() {
    this.cancelled = true;
  }
  
  getRemainingTime(): number {
    return Math.max(0, this.deadline - Date.now());
  }
}

Propagate the context through your call stack. Check it before expensive operations.

HTTP Status Codes and Retry-After

Tell clients when to retry.

429 Too Many Requests:

if (rateLimiter.isRateLimited(clientId)) {
  const waitTime = rateLimiter.getWaitTime(clientId);
  res.status(429)
     .setHeader('Retry-After', Math.ceil(waitTime / 1000))
     .json({ error: 'Rate limit exceeded', retryAfter: waitTime });
  return;
}

503 Service Unavailable:

if (queue.isFull()) {
  res.status(503)
     .setHeader('Retry-After', '10')
     .json({ error: 'Service overloaded', retryAfter: 10 });
  return;
}

Clients should respect Retry-After. Don’t retry immediately.

Backpressure Inside Your Services

Edge backpressure isn’t enough. You need backpressure inside your services too.

Bounded Worker Pools and Job Queues

Never use unbounded queues. Always set a max size.

class BoundedQueue<T> {
  private queue: T[] = [];
  
  constructor(private maxSize: number) {}
  
  enqueue(item: T): boolean {
    if (this.queue.length >= this.maxSize) {
      return false;
    }
    this.queue.push(item);
    return true;
  }
  
  dequeue(): T | undefined {
    return this.queue.shift();
  }
  
  get size() {
    return this.queue.length;
  }
  
  get isFull() {
    return this.queue.length >= this.maxSize;
  }
}

Worker pool:

class WorkerPool {
  private workers: Worker[] = [];
  private queue: BoundedQueue<Job>;
  
  constructor(
    private numWorkers: number,
    queueSize: number
  ) {
    this.queue = new BoundedQueue(queueSize);
    this.startWorkers();
  }
  
  private startWorkers() {
    for (let i = 0; i < this.numWorkers; i++) {
      const worker = new Worker(this.processJob.bind(this));
      this.workers.push(worker);
    }
  }
  
  async submit(job: Job): Promise<void> {
    if (!this.queue.enqueue(job)) {
      throw new Error('Queue full');
    }
  }
  
  private async processJob() {
    while (true) {
      const job = this.queue.dequeue();
      if (!job) {
        await sleep(100);
        continue;
      }
      
      try {
        await job.execute();
      } catch (error) {
        await job.handleError(error);
      }
    }
  }
}

Fixed workers. Bounded queue. If the queue is full, reject new jobs.

Dropping Low-Priority Work

When overloaded, drop less important work first.

class PriorityQueue {
  private queues: Map<number, Job[]> = new Map();
  
  enqueue(job: Job, priority: number) {
    if (!this.queues.has(priority)) {
      this.queues.set(priority, []);
    }
    this.queues.get(priority)!.push(job);
  }
  
  dequeue(): Job | undefined {
    const priorities = Array.from(this.queues.keys()).sort((a, b) => b - a);
    for (const priority of priorities) {
      const queue = this.queues.get(priority);
      if (queue && queue.length > 0) {
        return queue.shift();
      }
    }
    return undefined;
  }
  
  dropLowPriority(): number {
    const lowPriority = Math.min(...Array.from(this.queues.keys()));
    const queue = this.queues.get(lowPriority);
    if (queue) {
      const dropped = queue.length;
      queue.length = 0;
      return dropped;
    }
    return 0;
  }
}

When the queue is full, drop low-priority jobs. Keep high-priority work.

Circuit Breakers

Stop calling slow dependencies.

enum CircuitState {
  CLOSED,
  OPEN,
  HALF_OPEN
}

class CircuitBreaker {
  private state = CircuitState.CLOSED;
  private failures = 0;
  private lastFailureTime = 0;
  
  constructor(
    private failureThreshold: number,
    private timeout: number,
    private halfOpenMaxCalls: number
  ) {}
  
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      if (Date.now() - this.lastFailureTime > this.timeout) {
        this.state = CircuitState.HALF_OPEN;
        this.failures = 0;
      } else {
        throw new Error('Circuit breaker is open');
      }
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  private onSuccess() {
    this.failures = 0;
    if (this.state === CircuitState.HALF_OPEN) {
      this.state = CircuitState.CLOSED;
    }
  }
  
  private onFailure() {
    this.failures++;
    this.lastFailureTime = Date.now();
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.state = CircuitState.OPEN;
    } else if (this.failures >= this.failureThreshold) {
      this.state = CircuitState.OPEN;
    }
  }
  
  getState(): CircuitState {
    return this.state;
  }
}

After N failures, open the circuit. Stop calling the dependency. After timeout, try half-open. If it works, close it. If not, open it again.

Choosing Safe Queue Sizes and Worker Counts

Queue size: Rule of thumb: 2-3x your worker capacity.

If you have 10 workers processing 10 req/sec each, you can handle 100 req/sec. Set queue size to 200-300. This gives you a 2-3 second buffer.

Too small: Queue fills quickly. Lots of rejections. Too large: Requests wait too long. Stale responses.

Worker count: Rule of thumb: CPU cores × 2 for I/O-bound work. CPU cores for CPU-bound work.

If you have 4 CPU cores and work is I/O-bound (database, APIs), use 8 workers. If work is CPU-bound (image processing), use 4 workers.

Monitor queue depth. If it’s always growing, add workers. If it’s always empty, you have too many.

Protecting Databases and External APIs

Databases are usually the bottleneck. External APIs are unpredictable. Both need protection.

Why Databases Are Usually the Bottleneck

Databases have limited connections. Limited CPU. Limited I/O.

When you overload a database:

Connections exhaust
Queries queue
Locks accumulate
Everything slows down

You can’t scale a database like you scale application servers. You need to protect it.

Query Timeouts and Cancellation

Set timeouts on every query.

async function queryWithTimeout<T>(
  query: string,
  params: any[],
  timeoutMs: number
): Promise<T> {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
  
  try {
    const result = await db.query(query, params, { signal: controller.signal });
    clearTimeout(timeoutId);
    return result;
  } catch (error) {
    if (error.name === 'AbortError') {
      throw new Error('Query timeout');
    }
    throw error;
  }
}

Cancel long-running queries. Don’t let them hang forever.

Bulkheads: Separate Pools for Different Traffic

Don’t let one traffic class kill everything else.

class Bulkhead {
  private pools: Map<string, ConnectionPool> = new Map();
  
  getPool(trafficClass: string): ConnectionPool {
    if (!this.pools.has(trafficClass)) {
      this.pools.set(trafficClass, new ConnectionPool({
        maxConnections: this.getMaxConnections(trafficClass)
      }));
    }
    return this.pools.get(trafficClass)!;
  }
  
  private getMaxConnections(trafficClass: string): number {
    switch (trafficClass) {
      case 'critical': return 20;
      case 'normal': return 10;
      case 'best-effort': return 5;
      default: return 5;
    }
  }
}

Critical traffic gets its own pool. If best-effort traffic overwhelms its pool, it doesn’t affect critical traffic.

Throttling Outbound Calls

Limit how fast you call external APIs.

class OutboundThrottle {
  private lastCallTime = 0;
  private minInterval: number;
  
  constructor(callsPerSecond: number) {
    this.minInterval = 1000 / callsPerSecond;
  }
  
  async throttle<T>(fn: () => Promise<T>): Promise<T> {
    const now = Date.now();
    const elapsed = now - this.lastCallTime;
    const waitTime = Math.max(0, this.minInterval - elapsed);
    
    if (waitTime > 0) {
      await sleep(waitTime);
    }
    
    this.lastCallTime = Date.now();
    return fn();
  }
}

Don’t hammer external APIs. Respect their rate limits. Your system depends on them.

End-to-End Design Example

Let’s walk through a “Place Order” flow with backpressure at every layer.

Client Retry Rules

class OrderClient {
  async placeOrder(order: Order): Promise<OrderResult> {
    const maxRetries = 3;
    let attempt = 0;
    
    while (attempt < maxRetries) {
      try {
        const response = await fetch('/api/orders', {
          method: 'POST',
          body: JSON.stringify(order),
          signal: AbortSignal.timeout(5000)
        });
        
        if (response.status === 429) {
          const retryAfter = parseInt(response.headers.get('Retry-After') || '1');
          await sleep(retryAfter * 1000);
          attempt++;
          continue;
        }
        
        if (response.status === 503) {
          const retryAfter = parseInt(response.headers.get('Retry-After') || '10');
          await sleep(retryAfter * 1000);
          attempt++;
          continue;
        }
        
        if (!response.ok) {
          throw new Error(`HTTP ${response.status}`);
        }
        
        return await response.json();
      } catch (error) {
        if (error.name === 'AbortError') {
          attempt++;
          if (attempt >= maxRetries) throw error;
          await sleep(Math.pow(2, attempt) * 1000); // Exponential backoff
          continue;
        }
        throw error;
      }
    }
    
    throw new Error('Max retries exceeded');
  }
}

Respect 429 and 503. Use exponential backoff. Don’t retry forever.

API Concurrency and Queue Limits

const orderLimiter = new ConcurrencyLimiter(50);
const orderQueue = new BoundedQueue<Order>(200);

app.post('/api/orders', async (req, res) => {
  const order = req.body;
  
  // Check rate limit
  if (!rateLimiter.allow(req.userId)) {
    const waitTime = rateLimiter.getWaitTime(req.userId);
    return res.status(429)
              .setHeader('Retry-After', Math.ceil(waitTime / 1000))
              .json({ error: 'Rate limit exceeded' });
  }
  
  // Check queue capacity
  if (orderQueue.isFull()) {
    return res.status(503)
              .setHeader('Retry-After', '10')
              .json({ error: 'Service overloaded' });
  }
  
  // Enqueue with concurrency limit
  try {
    await orderLimiter.execute(async () => {
      if (!orderQueue.enqueue(order)) {
        throw new Error('Queue full');
      }
    });
    
    res.status(202).json({ message: 'Order queued', orderId: order.id });
  } catch (error) {
    res.status(503).json({ error: 'Service overloaded' });
  }
});

Rate limit. Check queue. Enqueue with concurrency limit. Reject if overloaded.

Worker Pool Logic

const workerPool = new WorkerPool(10, 200);

class OrderWorker {
  async processOrder(order: Order) {
    const context = new RequestContext(30000); // 30 second deadline
    
    try {
      // Validate order
      await this.validateOrder(order, context);
      
      // Check inventory
      const inventory = await this.checkInventory(order.items, context);
      if (!inventory.available) {
        throw new Error('Out of stock');
      }
      
      // Process payment
      const payment = await this.processPayment(order, context);
      if (!payment.success) {
        throw new Error('Payment failed');
      }
      
      // Create order
      const orderRecord = await this.createOrder(order, payment, context);
      
      // Send confirmation
      await this.sendConfirmation(orderRecord, context);
      
      return orderRecord;
    } catch (error) {
      if (context.isExpired()) {
        throw new Error('Request timeout');
      }
      throw error;
    }
  }
  
  private async validateOrder(order: Order, context: RequestContext) {
    if (context.isExpired()) throw new Error('Timeout');
    // Validation logic
  }
  
  private async checkInventory(items: Item[], context: RequestContext) {
    if (context.isExpired()) throw new Error('Timeout');
    return await db.queryWithTimeout(
      'SELECT * FROM inventory WHERE item_id IN (...)',
      [items.map(i => i.id)],
      context.getRemainingTime()
    );
  }
  
  private async processPayment(order: Order, context: RequestContext) {
    if (context.isExpired()) throw new Error('Timeout');
    return await paymentCircuitBreaker.execute(async () => {
      return await paymentService.charge(order, {
        timeout: context.getRemainingTime()
      });
    });
  }
  
  private async createOrder(order: Order, payment: Payment, context: RequestContext) {
    if (context.isExpired()) throw new Error('Timeout');
    return await db.queryWithTimeout(
      'INSERT INTO orders ...',
      [order, payment],
      context.getRemainingTime()
    );
  }
  
  private async sendConfirmation(order: Order, context: RequestContext) {
    if (context.isExpired()) return; // Best effort, don't fail
    try {
      await emailService.send({
        to: order.email,
        subject: 'Order confirmed',
        body: `Your order ${order.id} is confirmed`
      }, {
        timeout: context.getRemainingTime()
      });
    } catch (error) {
      // Log but don't fail
      console.error('Failed to send confirmation', error);
    }
  }
}

Fixed workers. Bounded queue. Timeouts everywhere. Circuit breakers for external calls.

Database Timeout and Retry Rules

class DatabasePool {
  private pool: Connection[];
  private maxConnections: number;
  
  constructor(maxConnections: number) {
    this.maxConnections = maxConnections;
    this.pool = [];
  }
  
  async getConnection(timeoutMs: number): Promise<Connection> {
    const start = Date.now();
    while (this.pool.length === 0 && Date.now() - start < timeoutMs) {
      await sleep(10);
    }
    
    if (this.pool.length === 0) {
      throw new Error('Connection pool exhausted');
    }
    
    return this.pool.pop()!;
  }
  
  async query<T>(sql: string, params: any[], timeoutMs: number): Promise<T> {
    const connection = await this.getConnection(timeoutMs);
    try {
      return await connection.queryWithTimeout(sql, params, timeoutMs);
    } finally {
      this.pool.push(connection);
    }
  }
}

Connection pool with timeout. Query timeout. Return connections to pool.

System Behavior Under Load

Normal load (100 req/sec):

Queue depth: 10-20
Response time: 200ms
Database connections: 20/50
Rejections: 0

Sudden 5x spike (500 req/sec):

Queue depth: 150-200 (hits limit)
Response time: 500ms
Database connections: 45/50
Rejections: 300 req/sec (429/503)
System stays stable

Downstream outage (payment service down):

Circuit breaker opens after 5 failures
Payment calls fail fast
Orders queue but don’t process
Queue depth grows
After 10 seconds, queue full, new orders rejected with 503
System stays stable, doesn’t crash

Backpressure at every layer keeps the system stable.

Observability for Backpressure

You can’t manage what you can’t measure. Track backpressure metrics.

What to Measure

Queue depth: How many items are waiting. Track min, max, avg, p95, p99.

class QueueMetrics {
  private depths: number[] = [];
  
  recordDepth(depth: number) {
    this.depths.push(depth);
    if (this.depths.length > 1000) {
      this.depths.shift();
    }
  }
  
  getStats() {
    const sorted = [...this.depths].sort((a, b) => a - b);
    return {
      min: sorted[0],
      max: sorted[sorted.length - 1],
      avg: sorted.reduce((a, b) => a + b, 0) / sorted.length,
      p95: sorted[Math.floor(sorted.length * 0.95)],
      p99: sorted[Math.floor(sorted.length * 0.99)]
    };
  }
}

In-flight requests: How many requests are being processed right now.

class InFlightTracker {
  private count = 0;
  
  start() {
    this.count++;
  }
  
  finish() {
    this.count--;
  }
  
  getCount() {
    return this.count;
  }
}

Latency percentiles: Track p50, p95, p99 latency per dependency.

class LatencyTracker {
  private latencies: Map<string, number[]> = new Map();
  
  record(dependency: string, latency: number) {
    if (!this.latencies.has(dependency)) {
      this.latencies.set(dependency, []);
    }
    this.latencies.get(dependency)!.push(latency);
  }
  
  getPercentiles(dependency: string) {
    const latencies = this.latencies.get(dependency) || [];
    const sorted = [...latencies].sort((a, b) => a - b);
    return {
      p50: sorted[Math.floor(sorted.length * 0.5)],
      p95: sorted[Math.floor(sorted.length * 0.95)],
      p99: sorted[Math.floor(sorted.length * 0.99)]
    };
  }
}

Rejection counts: Track why requests are rejected.

class RejectionTracker {
  private counts: Map<string, number> = new Map();
  
  record(reason: string) {
    this.counts.set(reason, (this.counts.get(reason) || 0) + 1);
  }
  
  getCounts() {
    return Object.fromEntries(this.counts);
  }
}

Reasons: rate_limit, queue_full, timeout, circuit_open, connection_pool_exhausted.

How to Alert Without Noise

Don’t alert on every spike. Alert on sustained problems.

Queue growth: Alert if queue depth > threshold for 5 minutes.

class QueueAlert {
  private violations = 0;
  
  check(depth: number, threshold: number) {
    if (depth > threshold) {
      this.violations++;
      if (this.violations >= 5) { // 5 consecutive checks
        this.alert('Queue depth exceeded threshold');
      }
    } else {
      this.violations = 0;
    }
  }
}

Rising latency: Alert if p95 latency increases by 2x for 5 minutes.

class LatencyAlert {
  private baseline: number | null = null;
  private violations = 0;
  
  check(p95Latency: number) {
    if (this.baseline === null) {
      this.baseline = p95Latency;
      return;
    }
    
    if (p95Latency > this.baseline * 2) {
      this.violations++;
      if (this.violations >= 5) {
        this.alert('Latency increased significantly');
      }
    } else {
      this.violations = 0;
      this.baseline = p95Latency; // Update baseline
    }
  }
}

Rejection rate: Alert if rejection rate > 10% for 5 minutes.

class RejectionAlert {
  private rejections = 0;
  private total = 0;
  
  record(rejected: boolean) {
    this.total++;
    if (rejected) this.rejections++;
  }
  
  check() {
    const rate = this.rejections / this.total;
    if (rate > 0.1 && this.total > 100) {
      this.alert(`Rejection rate: ${(rate * 100).toFixed(1)}%`);
    }
    // Reset every 5 minutes
    this.rejections = 0;
    this.total = 0;
  }
}

Alert on sustained problems, not transient spikes.

Practical Checklist

Use this checklist when designing backpressure.

Client:

Rate limiting with exponential backoff
Respect 429 and 503 status codes
Respect Retry-After headers
Request timeouts
Request queuing for non-critical requests

API Layer:

Per-client rate limits (token bucket or leaky bucket)
Global concurrency limits per endpoint
Request timeouts and deadlines
Bounded request queues
Proper HTTP status codes (429, 503)

Queues:

Bounded queue size (2-3x worker capacity)
Priority queues for different traffic classes
Dead letter queues for failed jobs
Queue depth monitoring

Worker Pools:

Fixed worker count (CPU cores × 2 for I/O-bound)
Worker health checks
Graceful shutdown
In-flight request tracking

Database:

Connection pooling with limits
Query timeouts
Query cancellation
Bulkheads for different traffic classes
Read replicas for read-heavy workloads

External APIs:

Observability:

Testing:

Load testing with traffic spikes
Chaos testing with downstream failures
Verify backpressure works at each layer
Verify system recovers after overload

If you can check all these, your system will handle traffic spikes gracefully.

Conclusion

Backpressure isn’t optional. It’s essential.

When traffic spikes or downstream services fail, systems without backpressure crash. Systems with backpressure slow down gracefully.

The patterns are simple:

Rate limit at the edge
Bound your queues
Limit your concurrency
Timeout everything
Use circuit breakers
Monitor and alert

But most teams skip them. They focus on scaling up. They forget about pushing back.

Don’t make that mistake. Design backpressure from the start. Your system will thank you.

Designing Backpressure-First Systems: Surviving Traffic Spikes Without Falling Over

Why Backpressure Matters in Modern Systems

The Failure Chain

Scaling Up vs Pushing Back

Where Backpressure Lives

Mental Model: Flow vs Capacity

Little’s Law in Plain Language

Visualizing Healthy vs Overloaded

Backpressure Patterns at the Edge

Per-Client Rate Limits

Global Concurrency Caps

Connection Timeouts and Deadlines

HTTP Status Codes and Retry-After

Backpressure Inside Your Services

Bounded Worker Pools and Job Queues

Dropping Low-Priority Work

Circuit Breakers

Choosing Safe Queue Sizes and Worker Counts

Protecting Databases and External APIs

Why Databases Are Usually the Bottleneck

Query Timeouts and Cancellation

Bulkheads: Separate Pools for Different Traffic

Throttling Outbound Calls

End-to-End Design Example

Client Retry Rules

API Concurrency and Queue Limits

Worker Pool Logic

Database Timeout and Retry Rules

System Behavior Under Load

Observability for Backpressure

What to Measure

How to Alert Without Noise

Practical Checklist

Conclusion

Discussion

Discussion

Confirm Action

Sign In

Designing Backpressure-First Systems: Surviving Traffic Spikes Without Falling Over

Stay Updated

Discussion

Discussion

Sign In