I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥
Source: Dev.to
I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥
If you do freelance backend long enough, you notice a pattern:
- Clients don’t pay for “beautiful code”.
- They pay for it working tomorrow.
Webhook integrations are the fastest way to get random chaos:
- duplicate events
- out‑of‑order delivery
- retries that DDoS you
- the classic “it worked yesterday 🤡”
Below is a practical checklist + a simple architecture that scales. No theory, just what works in production.
1️⃣ Assume the webhook will be duplicated. Because it will. ✅
Rule: every webhook must be idempotent.
How:
- Extract
event_idfrom the payload (or generate a hash from stable fields). - Store it with a status.
- On a repeat, return
200 OKand do nothing.
Returning 500 will trigger more retries.
2️⃣ Acknowledge fast. Process async. ⚡
A webhook handler that does real work inside the HTTP request is a trap.
My default flow:
- Receive webhook.
- Validate signature / basic checks.
- Save the raw payload + metadata to the DB.
- Return
200 OKimmediately. - Process the event in a worker / job queue.
This keeps the system calm even when the DB is slow or the provider times out.
3️⃣ Store raw payloads. Future you will thank you 🧠
When something breaks, the client will say: “I don’t know, it just didn’t send.”
If you don’t store raw payloads, you have no evidence and no replay.
Store:
- Full raw JSON payload
- Relevant headers
- Provider name
- Received timestamp
- Processing status
- Error message (if failed)
Having this data lets you:
- Replay events
- Debug edge cases
- Prove what happened
It turns “guessing” into “knowing”.
4️⃣ Security: verify signatures or don’t pretend it’s secure 🔒
If the provider supports signatures, verify them right away—not later or after the MVP.
Otherwise you expose a public endpoint that can be abused for spam or worse.
5️⃣ Rate limits and backoff: retries are not your enemy, your implementation is 😅
When processing fails, avoid instant retries. Use exponential backoff, e.g.:
- 1 min
- 5 min
- 30 min
- 2 h
- dead‑letter queue (manual review)
Most integration failures are temporary (provider downtime, DB hiccups, network glitches). Backoff makes the system survive like a tank.
6️⃣ Logging that actually helps, not “we logged something” 📝
I log at two layers:
Request layer
- request ID
- provider
- event ID
- status returned
Job (worker) layer
- event ID
- job attempt number
- result
- full error stack (if any)
Extra rule: if a job fails, save a short human‑readable error near the event record. This lets you scan the DB later and instantly spot patterns.
7️⃣ Minimal scalable structure (simple but powerful)
webhook_controller → accepts HTTP, validates, stores event, returns fast
event_store → saves raw payloads, dedup keys, statuses
processor → business logic (“what do we do with this event?”)
adapters → provider‑specific mapping (CRM A vs CRM B)
queue / worker → runs processing asynchronously with retry rules
Adding a new integration is just a matter of creating a new adapter; the rest stays untouched.
Common production “gotchas” (learned the annoying way) 🤝
Out‑of‑order events
You might receive an “updated” event before a “created” one.
Solution: allow upserts, store event history, and process based on the current state.
Provider sends partial data
Sometimes only IDs are sent and you must fetch the full details.
Solution: add a “hydration step” in the worker (API pull) and cache if needed.
Webhook timeouts
Processing inside the request leads to timeouts.
Solution: fast ACK, async processing (as described above).
TL;DR 🧾
To build webhook integrations that survive production:
- Idempotency is mandatory.
- Acknowledge fast, process asynchronously.
- Store raw payloads for replay and debugging.
- Verify signatures immediately.
- Implement sane retry/backoff strategies.
- Log with enough context to debug later.
If you’ve ever shipped webhooks in production, you already know: it’s never “done”; it’s “stable enough to survive real traffic” 😄
Drop your worst webhook horror story below 👇