By Yusuf Elborey

Idempotency by Design: Building 'Exactly-Once' Effects on 'At-Least-Once' Rails

idempotencydistributed-systemsexactly-onceat-least-oncekafkapostgresqlsystem-designreliability

Distributed systems retry. Networks drop. Without idempotency, users pay twice or you create duplicate orders.

This article shows a practical, end-to-end blueprint for building idempotent systems. Not theory. Working code you can use.

Why This Matters

You build a payment API. A user clicks “Pay” once. The network drops. The client retries. Your server processes it twice. The user gets charged twice.

Or you have a message queue. A consumer crashes mid-processing. It restarts. It processes the same message again. You create duplicate records.

These aren’t edge cases. They happen every day in production.

The problem isn’t that systems retry. Retries are good. The problem is that your code isn’t ready for them.

Idempotency fixes this. It makes retries safe. If you call the same operation twice with the same key, it only happens once.

What “Exactly-Once” Really Means

“Exactly-once” is a promise. It means: this operation will execute exactly once, even if you call it multiple times.

But here’s the thing: you can’t guarantee exactly-once delivery. Networks drop. Services crash. Messages get duplicated.

What you can guarantee is exactly-once effects. Even if the message arrives twice, the effect happens once.

That’s idempotency.

Where Duplicates Come From

Client retries: User clicks a button. Network is slow. Client times out. Client retries. Server processes both requests.

502s and timeouts: Server processes request. Response gets lost. Client retries. Server processes again.

Consumer restarts: Consumer processes message. Crashes before acknowledging. Restarts. Processes same message again.

Batch replays: You replay a batch of messages. Some were already processed. They get processed again.

Clock skew: Two servers have different clocks. They generate the same idempotency key. They conflict.

All of these are normal. Your system needs to handle them.

Idempotency Keys at the Edge

The first line of defense is idempotency keys. The client sends a unique key with each request. The server uses it to deduplicate.

How to Choose Keys

You have two options: business keys or opaque UUIDs.

Business keys: Use something meaningful. Order ID. User ID + timestamp. Payment ID.

// Business key example
const idempotencyKey = `order-${orderId}-${userId}`;

Good when:

  • You want to prevent duplicates across different clients
  • The key has business meaning
  • You need to debug by business context

Bad when:

  • Multiple clients might generate the same key
  • The key changes between retries
  • Clock skew causes collisions

Opaque UUIDs: Generate a random UUID. Client generates it once. Sends it with every retry.

// Opaque UUID example
const idempotencyKey = crypto.randomUUID();

Good when:

  • Each client generates unique keys
  • You don’t need business meaning
  • You want to avoid collisions

Bad when:

  • Client generates a new UUID on retry (defeats the purpose)
  • You need to debug by business context

Best practice: Use business keys when you can. Use UUIDs as fallback. Document which one you use.

TTLs and Storage

Idempotency keys need storage. And they need expiration.

Storage options:

  • Redis: Fast. Good for high throughput. But data can be lost.
  • Database: Durable. But slower. Good for critical operations.
  • Hybrid: Check Redis first. Fall back to database.

TTLs: Keys shouldn’t live forever. Set a TTL based on your use case.

// TTL examples
const TTL = {
  payment: 24 * 60 * 60,      // 24 hours
  order: 7 * 24 * 60 * 60,    // 7 days
  notification: 60 * 60       // 1 hour
};

Too short: Legitimate retries get rejected. Too long: Storage fills up. Old keys cause false positives.

Rule of thumb: Set TTL to 2-3x your maximum retry window. If clients retry for up to 1 hour, set TTL to 2-3 hours.

Failure Modes

Key reuse: Client reuses a key for a different operation. Server returns cached result. Wrong result.

Fix: Include operation type in the key. Or validate the request matches the cached request.

Clock skew: Two servers have different clocks. They generate the same timestamp-based key. Collision.

Fix: Use UUIDs. Or include server ID in the key. Or use a distributed ID generator.

Storage failure: Redis goes down. Database is slow. You can’t check idempotency.

Fix: Fail open or fail closed? Fail closed is safer (reject requests). But it breaks availability. Fail open is more available but allows duplicates. Choose based on your risk tolerance.

Persistence Patterns

Idempotency keys are the first layer. You also need persistence patterns. These handle the “what if the server crashes mid-operation” problem.

Outbox Pattern (Producer)

You need to publish a message. But you also need to update a database. These should happen together. But they’re in different systems.

The outbox pattern fixes this.

How it works:

  1. Write to database and outbox table in one transaction
  2. Background job reads from outbox
  3. Publishes to message queue
  4. Marks as published
-- Outbox table
CREATE TABLE outbox (
  id BIGSERIAL PRIMARY KEY,
  message_id UUID NOT NULL UNIQUE,
  topic VARCHAR(255) NOT NULL,
  payload JSONB NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  published_at TIMESTAMP,
  retry_count INT DEFAULT 0
);

CREATE INDEX idx_outbox_unpublished ON outbox (created_at) 
WHERE published_at IS NULL;

Why it works: Database transaction ensures atomicity. Either both write and outbox record succeed, or neither does. Background job handles publishing. If it fails, it retries.

Implementation:

async function createOrder(orderData, idempotencyKey) {
  return await db.transaction(async (tx) => {
    // Check idempotency
    const existing = await tx.query(
      'SELECT result FROM idempotency_keys WHERE key = $1',
      [idempotencyKey]
    );
    
    if (existing.rows.length > 0) {
      return JSON.parse(existing.rows[0].result);
    }
    
    // Create order
    const order = await tx.query(
      'INSERT INTO orders (user_id, amount) VALUES ($1, $2) RETURNING *',
      [orderData.userId, orderData.amount]
    );
    
    // Write to outbox
    await tx.query(
      'INSERT INTO outbox (message_id, topic, payload) VALUES ($1, $2, $3)',
      [
        crypto.randomUUID(),
        'order.created',
        JSON.stringify({ orderId: order.rows[0].id })
      ]
    );
    
    // Store idempotency result
    await tx.query(
      'INSERT INTO idempotency_keys (key, result, expires_at) VALUES ($1, $2, $3)',
      [
        idempotencyKey,
        JSON.stringify(order.rows[0]),
        new Date(Date.now() + 24 * 60 * 60 * 1000)
      ]
    );
    
    return order.rows[0];
  });
}

Inbox Pattern (Consumer)

You consume messages. But what if you process the same message twice?

The inbox pattern deduplicates at the consumer.

How it works:

  1. Consumer receives message
  2. Checks inbox table for message_id
  3. If exists, skip
  4. If not, process and insert into inbox
  5. Acknowledge message
-- Inbox table
CREATE TABLE inbox (
  message_id VARCHAR(255) PRIMARY KEY,
  topic VARCHAR(255) NOT NULL,
  payload JSONB NOT NULL,
  processed_at TIMESTAMP NOT NULL DEFAULT NOW(),
  processing_duration_ms INT
);

CREATE INDEX idx_inbox_processed ON inbox (processed_at);

Why it works: Unique constraint on message_id prevents duplicates. Even if consumer processes the same message twice, the second insert fails. Safe.

Implementation:

func (c *Consumer) ProcessMessage(msg kafka.Message) error {
    messageID := string(msg.Key)
    
    // Check inbox
    var existing string
    err := c.db.QueryRow(
        "SELECT message_id FROM inbox WHERE message_id = $1",
        messageID,
    ).Scan(&existing)
    
    if err == nil {
        // Already processed
        log.Printf("Message %s already processed, skipping", messageID)
        return nil
    }
    
    if err != sql.ErrNoRows {
        return err
    }
    
    // Process message
    start := time.Now()
    if err := c.handleMessage(msg); err != nil {
        return err
    }
    duration := time.Since(start)
    
    // Insert into inbox
    _, err = c.db.Exec(
        "INSERT INTO inbox (message_id, topic, payload, processing_duration_ms) VALUES ($1, $2, $3, $4)",
        messageID,
        msg.Topic,
        msg.Value,
        duration.Milliseconds(),
    )
    
    if err != nil {
        // Might be a race condition - check again
        var check string
        checkErr := c.db.QueryRow(
            "SELECT message_id FROM inbox WHERE message_id = $1",
            messageID,
        ).Scan(&check)
        
        if checkErr == nil {
            // Another consumer processed it - that's fine
            return nil
        }
        
        return err
    }
    
    return nil
}

Unique Constraints vs. Conditional Writes

You have two options for deduplication: unique constraints or conditional writes.

Unique constraints: Database enforces uniqueness. Insert fails if duplicate.

CREATE TABLE idempotency_keys (
  key VARCHAR(255) PRIMARY KEY,
  result JSONB NOT NULL,
  expires_at TIMESTAMP NOT NULL
);

Pros: Simple. Database handles it. Cons: Insert failures need handling. Race conditions possible.

Conditional writes: Check-then-insert pattern. Or use database-specific features.

-- PostgreSQL example
INSERT INTO idempotency_keys (key, result, expires_at)
VALUES ($1, $2, $3)
ON CONFLICT (key) DO UPDATE SET
  result = EXCLUDED.result,
  expires_at = EXCLUDED.expires_at
WHERE idempotency_keys.expires_at < NOW();

Pros: Handles conflicts gracefully. Can update expired keys. Cons: More complex. Database-specific.

Best practice: Use unique constraints for simplicity. Use conditional writes when you need more control.

Transaction Boundaries

Transactions matter. Get them wrong, and you get race conditions.

Problem:

// WRONG: Race condition
const existing = await checkIdempotency(key);
if (!existing) {
  const result = await processRequest();
  await storeIdempotency(key, result);
}

Two requests can both pass the check. Both process. Both store. Duplicate.

Solution: Do everything in one transaction.

// RIGHT: Atomic
await db.transaction(async (tx) => {
  const existing = await tx.query(
    'SELECT result FROM idempotency_keys WHERE key = $1 FOR UPDATE',
    [key]
  );
  
  if (existing.rows.length > 0) {
    return JSON.parse(existing.rows[0].result);
  }
  
  const result = await processRequest();
  await tx.query(
    'INSERT INTO idempotency_keys (key, result) VALUES ($1, $2)',
    [key, JSON.stringify(result)]
  );
  
  return result;
});

FOR UPDATE locks the row. Second request waits. First request completes. Second request sees the result. No duplicate.

Messaging Specifics

Different message queues have different guarantees. Know what you’re working with.

Kafka “EOS” (Exactly-Once Semantics)

Kafka supports exactly-once semantics. But it’s not magic. You need to configure it.

Transactions: Enable transactions. Producer sends messages in a transaction. Either all succeed or all fail.

config := sarama.NewConfig()
config.Producer.Transactional.ID = "my-producer-id"
config.Producer.Idempotent = true
config.Net.MaxOpenRequests = 1

producer, err := sarama.NewSyncProducer(brokers, config)
if err != nil {
    return err
}

producer.BeginTxn()
producer.SendMessage(&sarama.ProducerMessage{
    Topic: "orders",
    Key:   sarama.StringEncoder(orderID),
    Value: sarama.ByteEncoder(payload),
})
producer.CommitTxn()

Producer ID and sequence numbers: Kafka tracks producer ID and sequence numbers. Duplicate messages get rejected.

Limitations:

  • Only works within a transaction
  • Requires idempotent producer
  • Consumer still needs deduplication (inbox pattern)

What you can actually promise: Kafka EOS prevents duplicates from producer retries. It doesn’t prevent duplicates from consumer restarts. You still need inbox pattern.

SQS/Kinesis with Dedupe Windows

SQS has message deduplication. Kinesis has similar features.

SQS:

  • Deduplication ID: Client provides. SQS deduplicates within 5 minutes.
  • Content-based deduplication: SQS hashes message body. Deduplicates within 5 minutes.
const params = {
  QueueUrl: queueUrl,
  MessageBody: JSON.stringify(data),
  MessageDeduplicationId: idempotencyKey,
  MessageGroupId: 'orders'
};

await sqs.sendMessage(params).promise();

Kinesis:

  • Partition key determines shard
  • Same partition key goes to same shard
  • Consumer needs deduplication (inbox pattern)

Limitations:

  • SQS deduplication window is 5 minutes
  • Kinesis doesn’t deduplicate automatically
  • Both need consumer-side deduplication for longer windows

Exactly-Once Illusions

“Exactly-once” is marketing. What you actually get:

Kafka EOS:

  • Exactly-once within a transaction
  • Idempotent producer prevents duplicates
  • Consumer still needs deduplication

SQS deduplication:

  • Deduplication within 5 minutes
  • After 5 minutes, duplicates possible
  • Consumer needs deduplication

Reality: You can’t guarantee exactly-once delivery. You can guarantee exactly-once effects. That’s idempotency.

HTTP + DB Example (End-to-End)

Let’s build a complete example. A REST endpoint with idempotency keys. Database schema. Safe retries.

REST Endpoint with Idempotency-Key

const express = require('express');
const { Pool } = require('pg');
const crypto = require('crypto');

const app = express();
app.use(express.json());

const pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

async function handleRequest(req, res) {
  const idempotencyKey = req.headers['idempotency-key'];
  
  if (!idempotencyKey) {
    return res.status(400).json({ error: 'Idempotency-Key header required' });
  }
  
  const client = await pool.connect();
  
  try {
    await client.query('BEGIN');
    
    // Check for existing result
    const existing = await client.query(
      `SELECT result, status FROM idempotency_keys 
       WHERE key = $1 AND expires_at > NOW() FOR UPDATE`,
      [idempotencyKey]
    );
    
    if (existing.rows.length > 0) {
      const { result, status } = existing.rows[0];
      
      if (status === 'completed') {
        await client.query('COMMIT');
        return res.status(200).json(JSON.parse(result));
      }
      
      if (status === 'processing') {
        await client.query('COMMIT');
        return res.status(409).json({ error: 'Request already processing' });
      }
    }
    
    // Mark as processing
    await client.query(
      `INSERT INTO idempotency_keys (key, status, expires_at) 
       VALUES ($1, 'processing', NOW() + INTERVAL '24 hours')
       ON CONFLICT (key) DO UPDATE SET status = 'processing'`,
      [idempotencyKey]
    );
    
    await client.query('COMMIT');
  } catch (err) {
    await client.query('ROLLBACK');
    throw err;
  } finally {
    client.release();
  }
  
  // Process request (outside transaction to avoid long locks)
  let result;
  try {
    result = await processOrder(req.body);
  } catch (err) {
    // Mark as failed
    await pool.query(
      `UPDATE idempotency_keys SET status = 'failed', error = $1 
       WHERE key = $2`,
      [err.message, idempotencyKey]
    );
    return res.status(500).json({ error: err.message });
  }
  
  // Store result
  await pool.query(
    `UPDATE idempotency_keys 
     SET status = 'completed', result = $1 
     WHERE key = $2`,
    [JSON.stringify(result), idempotencyKey]
  );
  
  res.status(201).json(result);
}

async function processOrder(orderData) {
  // Your business logic here
  const order = {
    id: crypto.randomUUID(),
    userId: orderData.userId,
    amount: orderData.amount,
    status: 'created',
    createdAt: new Date()
  };
  
  // Write to database
  await pool.query(
    'INSERT INTO orders (id, user_id, amount, status) VALUES ($1, $2, $3, $4)',
    [order.id, order.userId, order.amount, order.status]
  );
  
  return order;
}

app.post('/orders', handleRequest);

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

DB Schema for Idempotency Records

-- Idempotency keys table
CREATE TABLE idempotency_keys (
  key VARCHAR(255) PRIMARY KEY,
  status VARCHAR(50) NOT NULL, -- 'processing', 'completed', 'failed'
  result JSONB,
  error TEXT,
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  expires_at TIMESTAMP NOT NULL,
  request_hash VARCHAR(64) -- Hash of request body for validation
);

CREATE INDEX idx_idempotency_expires ON idempotency_keys (expires_at);

-- Orders table
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  amount DECIMAL(10, 2) NOT NULL,
  status VARCHAR(50) NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_orders_user ON orders (user_id);

Safe Retries Across 3 Failure Points

Failure point 1: Before processing Client sends request. Server receives. Checks idempotency. Finds existing. Returns cached result. Safe.

Failure point 2: During processing Client sends request. Server receives. Starts processing. Crashes. Client retries. Server checks idempotency. Sees “processing”. Returns 409. Client waits. Safe.

Failure point 3: After processing Client sends request. Server receives. Processes. Stores result. Response lost. Client retries. Server checks idempotency. Finds result. Returns cached result. Safe.

All three are handled.

Workflows and Sagas

Idempotency gets complex in workflows. Multiple steps. Multiple services. Multiple failure points.

How Idempotency Fits with Saga Steps

A saga is a distributed transaction. Multiple steps. Each step can fail. You need compensation.

Example:

  1. Create order
  2. Reserve inventory
  3. Charge payment
  4. Send notification

If step 3 fails, compensate: release inventory, cancel order.

Idempotency per step: Each step needs its own idempotency key.

const sagaId = crypto.randomUUID();

// Step 1: Create order
const orderKey = `${sagaId}:create-order`;
const order = await createOrderWithIdempotency(orderData, orderKey);

// Step 2: Reserve inventory
const inventoryKey = `${sagaId}:reserve-inventory:${order.id}`;
await reserveInventoryWithIdempotency(order.id, inventoryKey);

// Step 3: Charge payment
const paymentKey = `${sagaId}:charge-payment:${order.id}`;
await chargePaymentWithIdempotency(order.id, paymentKey);

// Step 4: Send notification
const notificationKey = `${sagaId}:send-notification:${order.id}`;
await sendNotificationWithIdempotency(order.id, notificationKey);

Saga-level idempotency: Also track the saga itself. Prevent running the same saga twice.

CREATE TABLE saga_executions (
  saga_id VARCHAR(255) PRIMARY KEY,
  status VARCHAR(50) NOT NULL,
  current_step INT,
  steps JSONB NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Compensations

Compensations need idempotency too. If you compensate twice, you undo too much.

async function compensateOrder(orderId, sagaId) {
  const compensationKey = `${sagaId}:compensate:${orderId}`;
  
  // Check if already compensated
  const existing = await checkIdempotency(compensationKey);
  if (existing) {
    return existing;
  }
  
  // Compensate
  await releaseInventory(orderId);
  await cancelOrder(orderId);
  
  // Store compensation
  await storeIdempotency(compensationKey, { status: 'compensated' });
}

Observability

You can’t fix what you can’t see. Log idempotency keys. Track dedupe rates. Alert on anomalies.

Logging the Key

Log the idempotency key with every request. Makes debugging possible.

logger.info('Processing request', {
  idempotencyKey,
  userId,
  operation: 'create_order',
  traceId
});

Dedupe Rates

Track how often you deduplicate. High rates mean lots of retries. Might indicate problems.

metrics.increment('idempotency.duplicate', {
  operation: 'create_order'
});

Anomaly Alerts

Alert on:

  • Sudden spike in duplicates (might indicate client bug)
  • High duplicate rate for specific operation (might indicate timeout issue)
  • Idempotency check failures (might indicate storage problem)
if (duplicateRate > 0.1) {
  alert.send('High duplicate rate detected', {
    rate: duplicateRate,
    operation: 'create_order'
  });
}

Testing

Test idempotency. Not just happy path. Test duplicates. Test failures. Test race conditions.

Deterministic Replay Tests

Replay the same request multiple times. Verify same result.

test('idempotency: same request returns same result', async () => {
  const key = 'test-key-123';
  const request = { userId: 'user-1', amount: 100 };
  
  const result1 = await createOrder(request, key);
  const result2 = await createOrder(request, key);
  const result3 = await createOrder(request, key);
  
  expect(result1.id).toBe(result2.id);
  expect(result2.id).toBe(result3.id);
});

Chaos and Forced Duplicates

Inject duplicates. See what happens.

test('idempotency: concurrent duplicates', async () => {
  const key = 'test-key-456';
  const request = { userId: 'user-2', amount: 200 };
  
  // Send 10 concurrent requests with same key
  const results = await Promise.all(
    Array(10).fill(null).map(() => createOrder(request, key))
  );
  
  // All should return same result
  const ids = results.map(r => r.id);
  const uniqueIds = new Set(ids);
  expect(uniqueIds.size).toBe(1);
});

Load Test Snippet

Test under load. See if idempotency holds.

// k6 load test
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  vus: 100,
  duration: '30s',
};

export default function () {
  const idempotencyKey = `test-${__VU}-${__ITER}`;
  const payload = JSON.stringify({
    userId: 'user-123',
    amount: 100
  });
  
  const params = {
    headers: {
      'Content-Type': 'application/json',
      'Idempotency-Key': idempotencyKey,
    },
  };
  
  const res = http.post('http://localhost:3000/orders', payload, params);
  
  check(res, {
    'status is 201 or 200': (r) => r.status === 201 || r.status === 200,
    'response has order id': (r) => {
      const body = JSON.parse(r.body);
      return body.id !== undefined;
    },
  });
  
  // Retry same request
  const res2 = http.post('http://localhost:3000/orders', payload, params);
  
  check(res2, {
    'retry returns same result': (r) => {
      const body1 = JSON.parse(res.body);
      const body2 = JSON.parse(r.body);
      return body1.id === body2.id;
    },
  });
}

Checklist

Here’s a go-live checklist. Paste it into your PR.

Idempotency Keys:

  • Idempotency-Key header required for write operations
  • Keys are validated (format, length)
  • Keys include operation type or are operation-specific
  • TTL set appropriately (2-3x max retry window)

Storage:

  • Idempotency storage chosen (Redis, DB, or hybrid)
  • Unique constraints on idempotency key
  • Indexes on expiration for cleanup
  • Cleanup job for expired keys

Transactions:

  • Idempotency check in same transaction as business logic
  • FOR UPDATE used to prevent race conditions
  • Status tracking (processing, completed, failed)

Messaging:

  • Outbox pattern implemented for producer
  • Inbox pattern implemented for consumer
  • Message IDs used for deduplication
  • Unique constraints on message IDs

Observability:

  • Idempotency keys logged with requests
  • Dedupe rate tracked
  • Alerts configured for anomalies

Testing:

  • Deterministic replay tests
  • Concurrent duplicate tests
  • Load tests with duplicates
  • Failure scenario tests

Documentation:

  • Idempotency key format documented
  • TTL values documented
  • Retry behavior documented
  • Error codes documented (409 for processing, etc.)

Summary

Idempotency isn’t optional in distributed systems. It’s required.

Start simple:

  • Idempotency keys at the edge
  • Unique constraints in the database
  • Transactions for atomicity

Add complexity when needed:

  • Outbox pattern for producers
  • Inbox pattern for consumers
  • Saga-level idempotency for workflows

Test it:

  • Replay tests
  • Concurrent tests
  • Load tests

Monitor it:

  • Log keys
  • Track dedupe rates
  • Alert on anomalies

The code examples in the repository show working implementations. Use them as a starting point. Adapt them to your needs.

Remember: You can’t prevent duplicates. You can make them safe.

Discussion

Join the conversation and share your thoughts

Discussion

0 / 5000