I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥

Published: (February 27, 2026 at 08:19 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥

If you do freelance backend long enough, you notice a pattern:

  • Clients don’t pay for “beautiful code”.
  • They pay for it working tomorrow.

Webhook integrations are the fastest way to get random chaos:

  • duplicate events
  • out‑of‑order delivery
  • retries that DDoS you
  • the classic “it worked yesterday 🤡”

Below is a practical checklist + a simple architecture that scales. No theory, just what works in production.

1️⃣ Assume the webhook will be duplicated. Because it will. ✅

Rule: every webhook must be idempotent.

How:

  1. Extract event_id from the payload (or generate a hash from stable fields).
  2. Store it with a status.
  3. On a repeat, return 200 OK and do nothing.

Returning 500 will trigger more retries.

2️⃣ Acknowledge fast. Process async. ⚡

A webhook handler that does real work inside the HTTP request is a trap.

My default flow:

  1. Receive webhook.
  2. Validate signature / basic checks.
  3. Save the raw payload + metadata to the DB.
  4. Return 200 OK immediately.
  5. Process the event in a worker / job queue.

This keeps the system calm even when the DB is slow or the provider times out.

3️⃣ Store raw payloads. Future you will thank you 🧠

When something breaks, the client will say: “I don’t know, it just didn’t send.”
If you don’t store raw payloads, you have no evidence and no replay.

Store:

  • Full raw JSON payload
  • Relevant headers
  • Provider name
  • Received timestamp
  • Processing status
  • Error message (if failed)

Having this data lets you:

  • Replay events
  • Debug edge cases
  • Prove what happened

It turns “guessing” into “knowing”.

4️⃣ Security: verify signatures or don’t pretend it’s secure 🔒

If the provider supports signatures, verify them right away—not later or after the MVP.
Otherwise you expose a public endpoint that can be abused for spam or worse.

5️⃣ Rate limits and backoff: retries are not your enemy, your implementation is 😅

When processing fails, avoid instant retries. Use exponential backoff, e.g.:

  • 1 min
  • 5 min
  • 30 min
  • 2 h
  • dead‑letter queue (manual review)

Most integration failures are temporary (provider downtime, DB hiccups, network glitches). Backoff makes the system survive like a tank.

6️⃣ Logging that actually helps, not “we logged something” 📝

I log at two layers:

Request layer

  • request ID
  • provider
  • event ID
  • status returned

Job (worker) layer

  • event ID
  • job attempt number
  • result
  • full error stack (if any)

Extra rule: if a job fails, save a short human‑readable error near the event record. This lets you scan the DB later and instantly spot patterns.

7️⃣ Minimal scalable structure (simple but powerful)

webhook_controller   → accepts HTTP, validates, stores event, returns fast
event_store           → saves raw payloads, dedup keys, statuses
processor            → business logic (“what do we do with this event?”)
adapters              → provider‑specific mapping (CRM A vs CRM B)
queue / worker       → runs processing asynchronously with retry rules

Adding a new integration is just a matter of creating a new adapter; the rest stays untouched.

Common production “gotchas” (learned the annoying way) 🤝

Out‑of‑order events

You might receive an “updated” event before a “created” one.

Solution: allow upserts, store event history, and process based on the current state.

Provider sends partial data

Sometimes only IDs are sent and you must fetch the full details.

Solution: add a “hydration step” in the worker (API pull) and cache if needed.

Webhook timeouts

Processing inside the request leads to timeouts.

Solution: fast ACK, async processing (as described above).

TL;DR 🧾

To build webhook integrations that survive production:

  • Idempotency is mandatory.
  • Acknowledge fast, process asynchronously.
  • Store raw payloads for replay and debugging.
  • Verify signatures immediately.
  • Implement sane retry/backoff strategies.
  • Log with enough context to debug later.

If you’ve ever shipped webhooks in production, you already know: it’s never “done”; it’s “stable enough to survive real traffic” 😄

Drop your worst webhook horror story below 👇

0 views
Back to Blog

Related posts

Read more »