Webhooks at Scale: Designing an Idempotent, Replay-Safe, and Observable Webhook System

Published: (January 19, 2026 at 04:19 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Webhooks look easy until your system processes the same payment three times, drops a critical event, and you can’t prove what actually happened. This article is a production‑grade deep dive into building a webhook ingestion system that survives retries, replays, out‑of‑order delivery, provider bugs, and your own future self.

Most webhook providers promise:

  • at‑least‑once delivery
  • retries on failure
  • signed payloads

What they don’t promise:

  • ordering
  • uniqueness
  • consistency
  • sane retry behavior

Reality: webhooks are an unreliable distributed queue that you do not control. Treat them as such.

Typical failure modes:

  • Duplicate events processed twice
  • Provider retries for hours after success
  • Events arriving out of order
  • Partial failures mid‑processing
  • Clock skew breaking signatures
  • Silent drops with no audit trail

A correct design assumes all of these happen daily.

Architecture Overview

Webhook Provider

   │  POST /webhook

Ingress Layer (Fast, Stateless)

   │  enqueue

Persistent Event Store

   │  dedupe + order

Event Processor

   │  side effects

Domain Services

Key Principle

Never do business logic in the webhook handler.

Webhook endpoints must:

  1. Verify the signature
  2. Persist the raw payload
  3. Return a 2xx response

Anything else belongs downstream.

Minimal webhook handler (Node/Express)

app.post('/webhook', async (req, res) => {
  verifySignature(req);
  await storeRawEvent(req);
  res.status(200).end();
});

If your endpoint takes more than 1–2 seconds, retries are guaranteed.

What to store

  • All request headers
  • Raw request body
  • Reception timestamp
  • Provider event ID (if any)

Idempotency

If your system is not idempotent, retries become data corruption.

Wrong approaches

  • “We’ll check if status already changed” ❌
  • “We’ll trust provider event IDs” ❌

Correct approach

Create your own idempotency key:

const key = hash(`${provider}:${eventType}:${externalObjectId}`);

Persist the key with a unique constraint. If the insert fails, treat it as a duplicate and skip safely.

Ordering

Providers do not guarantee ordering. Never assume:

  • Event A arrives before event B
  • Timestamps are monotonic

Strategy

Model events as state transitions and reject invalid transitions:

if (!isValidTransition(currentState, nextEvent)) {
  logAndIgnore();
}

This makes ordering irrelevant because only valid state changes are applied.

Transactional Outbox Pattern

Databases are transactional; external APIs are not. Use the outbox pattern:

  1. Write the domain change and an outbox record in the same transaction.
  2. Commit the transaction.
  3. An asynchronous worker reads pending outbox records and executes side effects.
  4. Mark the outbox record as done.

Benefits

  • Prevents double emails, double charges, and partial failures.

Common Mistakes

  • Parsing JSON before verification.
  • Ignoring header casing.
  • Using the system clock blindly.

Best practices

  • Verify against the raw body.
  • Allow a small clock skew when checking timestamps.
  • Fail closed: if verification fails, do not retry internally.

Observability & Auditing

You need to answer three questions for every webhook:

  1. Did we receive it?
  2. Did we process it?
  3. What did it change?

Minimum requirements

  • Event ID traceable across logs.
  • Processing status persisted.
  • Dead‑letter queue for failures.

If you can’t answer these in under 5 minutes, your system is blind. Missing any of them can lead to bugs you can’t undo.

Conclusion

Webhooks are not callbacks; they are untrusted, replayable messages. Once you treat them as such—storing raw payloads, enforcing idempotency, handling out‑of‑order delivery, and using an outbox for side effects—they become boring, reliable infrastructure. And boring infrastructure is the goal.

Back to Blog

Related posts

Read more »