Idempotency by Design: Building 'Exactly-Once' Effects on 'At-Least-Once' Rails
Distributed systems retry. Networks drop. Without idempotency, users pay twice or you create duplicate orders.
This article shows a practical, end-to-end blueprint for building idempotent systems. Not theory. Working code you can use.
Why This Matters
You build a payment API. A user clicks “Pay” once. The network drops. The client retries. Your server processes it twice. The user gets charged twice.
Or you have a message queue. A consumer crashes mid-processing. It restarts. It processes the same message again. You create duplicate records.
These aren’t edge cases. They happen every day in production.
The problem isn’t that systems retry. Retries are good. The problem is that your code isn’t ready for them.
Idempotency fixes this. It makes retries safe. If you call the same operation twice with the same key, it only happens once.
What “Exactly-Once” Really Means
“Exactly-once” is a promise. It means: this operation will execute exactly once, even if you call it multiple times.
But here’s the thing: you can’t guarantee exactly-once delivery. Networks drop. Services crash. Messages get duplicated.
What you can guarantee is exactly-once effects. Even if the message arrives twice, the effect happens once.
That’s idempotency.
Where Duplicates Come From
Client retries: User clicks a button. Network is slow. Client times out. Client retries. Server processes both requests.
502s and timeouts: Server processes request. Response gets lost. Client retries. Server processes again.
Consumer restarts: Consumer processes message. Crashes before acknowledging. Restarts. Processes same message again.
Batch replays: You replay a batch of messages. Some were already processed. They get processed again.
Clock skew: Two servers have different clocks. They generate the same idempotency key. They conflict.
All of these are normal. Your system needs to handle them.
Idempotency Keys at the Edge
The first line of defense is idempotency keys. The client sends a unique key with each request. The server uses it to deduplicate.
How to Choose Keys
You have two options: business keys or opaque UUIDs.
Business keys: Use something meaningful. Order ID. User ID + timestamp. Payment ID.
// Business key example
const idempotencyKey = `order-${orderId}-${userId}`;
Good when:
- You want to prevent duplicates across different clients
- The key has business meaning
- You need to debug by business context
Bad when:
- Multiple clients might generate the same key
- The key changes between retries
- Clock skew causes collisions
Opaque UUIDs: Generate a random UUID. Client generates it once. Sends it with every retry.
// Opaque UUID example
const idempotencyKey = crypto.randomUUID();
Good when:
- Each client generates unique keys
- You don’t need business meaning
- You want to avoid collisions
Bad when:
- Client generates a new UUID on retry (defeats the purpose)
- You need to debug by business context
Best practice: Use business keys when you can. Use UUIDs as fallback. Document which one you use.
TTLs and Storage
Idempotency keys need storage. And they need expiration.
Storage options:
- Redis: Fast. Good for high throughput. But data can be lost.
- Database: Durable. But slower. Good for critical operations.
- Hybrid: Check Redis first. Fall back to database.
TTLs: Keys shouldn’t live forever. Set a TTL based on your use case.
// TTL examples
const TTL = {
payment: 24 * 60 * 60, // 24 hours
order: 7 * 24 * 60 * 60, // 7 days
notification: 60 * 60 // 1 hour
};
Too short: Legitimate retries get rejected. Too long: Storage fills up. Old keys cause false positives.
Rule of thumb: Set TTL to 2-3x your maximum retry window. If clients retry for up to 1 hour, set TTL to 2-3 hours.
Failure Modes
Key reuse: Client reuses a key for a different operation. Server returns cached result. Wrong result.
Fix: Include operation type in the key. Or validate the request matches the cached request.
Clock skew: Two servers have different clocks. They generate the same timestamp-based key. Collision.
Fix: Use UUIDs. Or include server ID in the key. Or use a distributed ID generator.
Storage failure: Redis goes down. Database is slow. You can’t check idempotency.
Fix: Fail open or fail closed? Fail closed is safer (reject requests). But it breaks availability. Fail open is more available but allows duplicates. Choose based on your risk tolerance.
Persistence Patterns
Idempotency keys are the first layer. You also need persistence patterns. These handle the “what if the server crashes mid-operation” problem.
Outbox Pattern (Producer)
You need to publish a message. But you also need to update a database. These should happen together. But they’re in different systems.
The outbox pattern fixes this.
How it works:
- Write to database and outbox table in one transaction
- Background job reads from outbox
- Publishes to message queue
- Marks as published
-- Outbox table
CREATE TABLE outbox (
id BIGSERIAL PRIMARY KEY,
message_id UUID NOT NULL UNIQUE,
topic VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
published_at TIMESTAMP,
retry_count INT DEFAULT 0
);
CREATE INDEX idx_outbox_unpublished ON outbox (created_at)
WHERE published_at IS NULL;
Why it works: Database transaction ensures atomicity. Either both write and outbox record succeed, or neither does. Background job handles publishing. If it fails, it retries.
Implementation:
async function createOrder(orderData, idempotencyKey) {
return await db.transaction(async (tx) => {
// Check idempotency
const existing = await tx.query(
'SELECT result FROM idempotency_keys WHERE key = $1',
[idempotencyKey]
);
if (existing.rows.length > 0) {
return JSON.parse(existing.rows[0].result);
}
// Create order
const order = await tx.query(
'INSERT INTO orders (user_id, amount) VALUES ($1, $2) RETURNING *',
[orderData.userId, orderData.amount]
);
// Write to outbox
await tx.query(
'INSERT INTO outbox (message_id, topic, payload) VALUES ($1, $2, $3)',
[
crypto.randomUUID(),
'order.created',
JSON.stringify({ orderId: order.rows[0].id })
]
);
// Store idempotency result
await tx.query(
'INSERT INTO idempotency_keys (key, result, expires_at) VALUES ($1, $2, $3)',
[
idempotencyKey,
JSON.stringify(order.rows[0]),
new Date(Date.now() + 24 * 60 * 60 * 1000)
]
);
return order.rows[0];
});
}
Inbox Pattern (Consumer)
You consume messages. But what if you process the same message twice?
The inbox pattern deduplicates at the consumer.
How it works:
- Consumer receives message
- Checks inbox table for message_id
- If exists, skip
- If not, process and insert into inbox
- Acknowledge message
-- Inbox table
CREATE TABLE inbox (
message_id VARCHAR(255) PRIMARY KEY,
topic VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
processed_at TIMESTAMP NOT NULL DEFAULT NOW(),
processing_duration_ms INT
);
CREATE INDEX idx_inbox_processed ON inbox (processed_at);
Why it works: Unique constraint on message_id prevents duplicates. Even if consumer processes the same message twice, the second insert fails. Safe.
Implementation:
func (c *Consumer) ProcessMessage(msg kafka.Message) error {
messageID := string(msg.Key)
// Check inbox
var existing string
err := c.db.QueryRow(
"SELECT message_id FROM inbox WHERE message_id = $1",
messageID,
).Scan(&existing)
if err == nil {
// Already processed
log.Printf("Message %s already processed, skipping", messageID)
return nil
}
if err != sql.ErrNoRows {
return err
}
// Process message
start := time.Now()
if err := c.handleMessage(msg); err != nil {
return err
}
duration := time.Since(start)
// Insert into inbox
_, err = c.db.Exec(
"INSERT INTO inbox (message_id, topic, payload, processing_duration_ms) VALUES ($1, $2, $3, $4)",
messageID,
msg.Topic,
msg.Value,
duration.Milliseconds(),
)
if err != nil {
// Might be a race condition - check again
var check string
checkErr := c.db.QueryRow(
"SELECT message_id FROM inbox WHERE message_id = $1",
messageID,
).Scan(&check)
if checkErr == nil {
// Another consumer processed it - that's fine
return nil
}
return err
}
return nil
}
Unique Constraints vs. Conditional Writes
You have two options for deduplication: unique constraints or conditional writes.
Unique constraints: Database enforces uniqueness. Insert fails if duplicate.
CREATE TABLE idempotency_keys (
key VARCHAR(255) PRIMARY KEY,
result JSONB NOT NULL,
expires_at TIMESTAMP NOT NULL
);
Pros: Simple. Database handles it. Cons: Insert failures need handling. Race conditions possible.
Conditional writes: Check-then-insert pattern. Or use database-specific features.
-- PostgreSQL example
INSERT INTO idempotency_keys (key, result, expires_at)
VALUES ($1, $2, $3)
ON CONFLICT (key) DO UPDATE SET
result = EXCLUDED.result,
expires_at = EXCLUDED.expires_at
WHERE idempotency_keys.expires_at < NOW();
Pros: Handles conflicts gracefully. Can update expired keys. Cons: More complex. Database-specific.
Best practice: Use unique constraints for simplicity. Use conditional writes when you need more control.
Transaction Boundaries
Transactions matter. Get them wrong, and you get race conditions.
Problem:
// WRONG: Race condition
const existing = await checkIdempotency(key);
if (!existing) {
const result = await processRequest();
await storeIdempotency(key, result);
}
Two requests can both pass the check. Both process. Both store. Duplicate.
Solution: Do everything in one transaction.
// RIGHT: Atomic
await db.transaction(async (tx) => {
const existing = await tx.query(
'SELECT result FROM idempotency_keys WHERE key = $1 FOR UPDATE',
[key]
);
if (existing.rows.length > 0) {
return JSON.parse(existing.rows[0].result);
}
const result = await processRequest();
await tx.query(
'INSERT INTO idempotency_keys (key, result) VALUES ($1, $2)',
[key, JSON.stringify(result)]
);
return result;
});
FOR UPDATE locks the row. Second request waits. First request completes. Second request sees the result. No duplicate.
Messaging Specifics
Different message queues have different guarantees. Know what you’re working with.
Kafka “EOS” (Exactly-Once Semantics)
Kafka supports exactly-once semantics. But it’s not magic. You need to configure it.
Transactions: Enable transactions. Producer sends messages in a transaction. Either all succeed or all fail.
config := sarama.NewConfig()
config.Producer.Transactional.ID = "my-producer-id"
config.Producer.Idempotent = true
config.Net.MaxOpenRequests = 1
producer, err := sarama.NewSyncProducer(brokers, config)
if err != nil {
return err
}
producer.BeginTxn()
producer.SendMessage(&sarama.ProducerMessage{
Topic: "orders",
Key: sarama.StringEncoder(orderID),
Value: sarama.ByteEncoder(payload),
})
producer.CommitTxn()
Producer ID and sequence numbers: Kafka tracks producer ID and sequence numbers. Duplicate messages get rejected.
Limitations:
- Only works within a transaction
- Requires idempotent producer
- Consumer still needs deduplication (inbox pattern)
What you can actually promise: Kafka EOS prevents duplicates from producer retries. It doesn’t prevent duplicates from consumer restarts. You still need inbox pattern.
SQS/Kinesis with Dedupe Windows
SQS has message deduplication. Kinesis has similar features.
SQS:
- Deduplication ID: Client provides. SQS deduplicates within 5 minutes.
- Content-based deduplication: SQS hashes message body. Deduplicates within 5 minutes.
const params = {
QueueUrl: queueUrl,
MessageBody: JSON.stringify(data),
MessageDeduplicationId: idempotencyKey,
MessageGroupId: 'orders'
};
await sqs.sendMessage(params).promise();
Kinesis:
- Partition key determines shard
- Same partition key goes to same shard
- Consumer needs deduplication (inbox pattern)
Limitations:
- SQS deduplication window is 5 minutes
- Kinesis doesn’t deduplicate automatically
- Both need consumer-side deduplication for longer windows
Exactly-Once Illusions
“Exactly-once” is marketing. What you actually get:
Kafka EOS:
- Exactly-once within a transaction
- Idempotent producer prevents duplicates
- Consumer still needs deduplication
SQS deduplication:
- Deduplication within 5 minutes
- After 5 minutes, duplicates possible
- Consumer needs deduplication
Reality: You can’t guarantee exactly-once delivery. You can guarantee exactly-once effects. That’s idempotency.
HTTP + DB Example (End-to-End)
Let’s build a complete example. A REST endpoint with idempotency keys. Database schema. Safe retries.
REST Endpoint with Idempotency-Key
const express = require('express');
const { Pool } = require('pg');
const crypto = require('crypto');
const app = express();
app.use(express.json());
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function handleRequest(req, res) {
const idempotencyKey = req.headers['idempotency-key'];
if (!idempotencyKey) {
return res.status(400).json({ error: 'Idempotency-Key header required' });
}
const client = await pool.connect();
try {
await client.query('BEGIN');
// Check for existing result
const existing = await client.query(
`SELECT result, status FROM idempotency_keys
WHERE key = $1 AND expires_at > NOW() FOR UPDATE`,
[idempotencyKey]
);
if (existing.rows.length > 0) {
const { result, status } = existing.rows[0];
if (status === 'completed') {
await client.query('COMMIT');
return res.status(200).json(JSON.parse(result));
}
if (status === 'processing') {
await client.query('COMMIT');
return res.status(409).json({ error: 'Request already processing' });
}
}
// Mark as processing
await client.query(
`INSERT INTO idempotency_keys (key, status, expires_at)
VALUES ($1, 'processing', NOW() + INTERVAL '24 hours')
ON CONFLICT (key) DO UPDATE SET status = 'processing'`,
[idempotencyKey]
);
await client.query('COMMIT');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
// Process request (outside transaction to avoid long locks)
let result;
try {
result = await processOrder(req.body);
} catch (err) {
// Mark as failed
await pool.query(
`UPDATE idempotency_keys SET status = 'failed', error = $1
WHERE key = $2`,
[err.message, idempotencyKey]
);
return res.status(500).json({ error: err.message });
}
// Store result
await pool.query(
`UPDATE idempotency_keys
SET status = 'completed', result = $1
WHERE key = $2`,
[JSON.stringify(result), idempotencyKey]
);
res.status(201).json(result);
}
async function processOrder(orderData) {
// Your business logic here
const order = {
id: crypto.randomUUID(),
userId: orderData.userId,
amount: orderData.amount,
status: 'created',
createdAt: new Date()
};
// Write to database
await pool.query(
'INSERT INTO orders (id, user_id, amount, status) VALUES ($1, $2, $3, $4)',
[order.id, order.userId, order.amount, order.status]
);
return order;
}
app.post('/orders', handleRequest);
app.listen(3000, () => {
console.log('Server running on port 3000');
});
DB Schema for Idempotency Records
-- Idempotency keys table
CREATE TABLE idempotency_keys (
key VARCHAR(255) PRIMARY KEY,
status VARCHAR(50) NOT NULL, -- 'processing', 'completed', 'failed'
result JSONB,
error TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
expires_at TIMESTAMP NOT NULL,
request_hash VARCHAR(64) -- Hash of request body for validation
);
CREATE INDEX idx_idempotency_expires ON idempotency_keys (expires_at);
-- Orders table
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_user ON orders (user_id);
Safe Retries Across 3 Failure Points
Failure point 1: Before processing Client sends request. Server receives. Checks idempotency. Finds existing. Returns cached result. Safe.
Failure point 2: During processing Client sends request. Server receives. Starts processing. Crashes. Client retries. Server checks idempotency. Sees “processing”. Returns 409. Client waits. Safe.
Failure point 3: After processing Client sends request. Server receives. Processes. Stores result. Response lost. Client retries. Server checks idempotency. Finds result. Returns cached result. Safe.
All three are handled.
Workflows and Sagas
Idempotency gets complex in workflows. Multiple steps. Multiple services. Multiple failure points.
How Idempotency Fits with Saga Steps
A saga is a distributed transaction. Multiple steps. Each step can fail. You need compensation.
Example:
- Create order
- Reserve inventory
- Charge payment
- Send notification
If step 3 fails, compensate: release inventory, cancel order.
Idempotency per step: Each step needs its own idempotency key.
const sagaId = crypto.randomUUID();
// Step 1: Create order
const orderKey = `${sagaId}:create-order`;
const order = await createOrderWithIdempotency(orderData, orderKey);
// Step 2: Reserve inventory
const inventoryKey = `${sagaId}:reserve-inventory:${order.id}`;
await reserveInventoryWithIdempotency(order.id, inventoryKey);
// Step 3: Charge payment
const paymentKey = `${sagaId}:charge-payment:${order.id}`;
await chargePaymentWithIdempotency(order.id, paymentKey);
// Step 4: Send notification
const notificationKey = `${sagaId}:send-notification:${order.id}`;
await sendNotificationWithIdempotency(order.id, notificationKey);
Saga-level idempotency: Also track the saga itself. Prevent running the same saga twice.
CREATE TABLE saga_executions (
saga_id VARCHAR(255) PRIMARY KEY,
status VARCHAR(50) NOT NULL,
current_step INT,
steps JSONB NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
Compensations
Compensations need idempotency too. If you compensate twice, you undo too much.
async function compensateOrder(orderId, sagaId) {
const compensationKey = `${sagaId}:compensate:${orderId}`;
// Check if already compensated
const existing = await checkIdempotency(compensationKey);
if (existing) {
return existing;
}
// Compensate
await releaseInventory(orderId);
await cancelOrder(orderId);
// Store compensation
await storeIdempotency(compensationKey, { status: 'compensated' });
}
Observability
You can’t fix what you can’t see. Log idempotency keys. Track dedupe rates. Alert on anomalies.
Logging the Key
Log the idempotency key with every request. Makes debugging possible.
logger.info('Processing request', {
idempotencyKey,
userId,
operation: 'create_order',
traceId
});
Dedupe Rates
Track how often you deduplicate. High rates mean lots of retries. Might indicate problems.
metrics.increment('idempotency.duplicate', {
operation: 'create_order'
});
Anomaly Alerts
Alert on:
- Sudden spike in duplicates (might indicate client bug)
- High duplicate rate for specific operation (might indicate timeout issue)
- Idempotency check failures (might indicate storage problem)
if (duplicateRate > 0.1) {
alert.send('High duplicate rate detected', {
rate: duplicateRate,
operation: 'create_order'
});
}
Testing
Test idempotency. Not just happy path. Test duplicates. Test failures. Test race conditions.
Deterministic Replay Tests
Replay the same request multiple times. Verify same result.
test('idempotency: same request returns same result', async () => {
const key = 'test-key-123';
const request = { userId: 'user-1', amount: 100 };
const result1 = await createOrder(request, key);
const result2 = await createOrder(request, key);
const result3 = await createOrder(request, key);
expect(result1.id).toBe(result2.id);
expect(result2.id).toBe(result3.id);
});
Chaos and Forced Duplicates
Inject duplicates. See what happens.
test('idempotency: concurrent duplicates', async () => {
const key = 'test-key-456';
const request = { userId: 'user-2', amount: 200 };
// Send 10 concurrent requests with same key
const results = await Promise.all(
Array(10).fill(null).map(() => createOrder(request, key))
);
// All should return same result
const ids = results.map(r => r.id);
const uniqueIds = new Set(ids);
expect(uniqueIds.size).toBe(1);
});
Load Test Snippet
Test under load. See if idempotency holds.
// k6 load test
import http from 'k6/http';
import { check } from 'k6';
export const options = {
vus: 100,
duration: '30s',
};
export default function () {
const idempotencyKey = `test-${__VU}-${__ITER}`;
const payload = JSON.stringify({
userId: 'user-123',
amount: 100
});
const params = {
headers: {
'Content-Type': 'application/json',
'Idempotency-Key': idempotencyKey,
},
};
const res = http.post('http://localhost:3000/orders', payload, params);
check(res, {
'status is 201 or 200': (r) => r.status === 201 || r.status === 200,
'response has order id': (r) => {
const body = JSON.parse(r.body);
return body.id !== undefined;
},
});
// Retry same request
const res2 = http.post('http://localhost:3000/orders', payload, params);
check(res2, {
'retry returns same result': (r) => {
const body1 = JSON.parse(res.body);
const body2 = JSON.parse(r.body);
return body1.id === body2.id;
},
});
}
Checklist
Here’s a go-live checklist. Paste it into your PR.
Idempotency Keys:
- Idempotency-Key header required for write operations
- Keys are validated (format, length)
- Keys include operation type or are operation-specific
- TTL set appropriately (2-3x max retry window)
Storage:
- Idempotency storage chosen (Redis, DB, or hybrid)
- Unique constraints on idempotency key
- Indexes on expiration for cleanup
- Cleanup job for expired keys
Transactions:
- Idempotency check in same transaction as business logic
- FOR UPDATE used to prevent race conditions
- Status tracking (processing, completed, failed)
Messaging:
- Outbox pattern implemented for producer
- Inbox pattern implemented for consumer
- Message IDs used for deduplication
- Unique constraints on message IDs
Observability:
- Idempotency keys logged with requests
- Dedupe rate tracked
- Alerts configured for anomalies
Testing:
- Deterministic replay tests
- Concurrent duplicate tests
- Load tests with duplicates
- Failure scenario tests
Documentation:
- Idempotency key format documented
- TTL values documented
- Retry behavior documented
- Error codes documented (409 for processing, etc.)
Summary
Idempotency isn’t optional in distributed systems. It’s required.
Start simple:
- Idempotency keys at the edge
- Unique constraints in the database
- Transactions for atomicity
Add complexity when needed:
- Outbox pattern for producers
- Inbox pattern for consumers
- Saga-level idempotency for workflows
Test it:
- Replay tests
- Concurrent tests
- Load tests
Monitor it:
- Log keys
- Track dedupe rates
- Alert on anomalies
The code examples in the repository show working implementations. Use them as a starting point. Adapt them to your needs.
Remember: You can’t prevent duplicates. You can make them safe.
Discussion
Loading comments...