My mock server lied to me. So I built a stateful API sandbox.
Source: Dev.to
The problem with stateless mocks
Last month I was integrating with a payment API. I wrote my tests against a mock server, everything passed, shipped to staging — and the whole flow broke.
The mock told me POST /charges returns {"id": "ch_123"}. And it does. But my code then called GET /charges/ch_123 to verify the status, and the mock returned 404 because the mock doesn’t actually store anything. Every request lives in its own universe.
I lost half a day to this. And it wasn’t the first time.
I’ve used Prism, WireMock, Mockoon — they’re solid tools. You point them at an OpenAPI spec and they generate responses, but the responses are canned. There’s no memory between requests:
POST /customers → 201 {"id": "cust_123"}
GET /customers/cust_123 → 404 # has no idea you just created thisThis works fine for unit tests where you’re testing your HTTP client. It falls apart the moment you have a multi‑step flow.
Think about how a real Stripe integration works:
- Create a customer
- Create a payment intent for that customer (needs the customer ID from step 1)
- Confirm the payment intent (needs the PI ID from step 2)
- A webhook fires (your server needs to handle it)
A mock server can’t do steps 2‑4. The IDs don’t carry over. The webhook never fires. You’re testing a fantasy.
What I actually needed
I needed a sandbox where:
POSTcreates a real resource I canGETlater- IDs chain between requests like they would in production
- State transitions work (a charge goes from
pendingtosucceeded) - Webhooks fire when things change
In short – not a mock, but a tiny fake version of the actual API that behaves like the real thing.
So I built one
I’ve been heads‑down on FetchSandbox for a few months now. You give it an OpenAPI spec and it generates a stateful sandbox with seed data, state machines, and webhook events.
npm install -g fetchsandbox
fetchsandbox generate ./stripe-openapi.yaml
# ✓ Sandbox ready: 587 endpoints, 63 seed records
fetchsandbox run stripe --all
# ✓ Accept a payment — 3/3 steps passed
# ✓ Onboard a connected account — 3/3 steps passed
# ✓ Respond to a dispute — 2/2 steps passed
# ✓ All workflows passed — 3/3 (9ms)The run --all command is the thing I wish I’d had. It executes every integration workflow end‑to‑end — creating resources, chaining IDs between steps, and verifying each response. If something breaks, you see exactly which step failed and why.
The stuff that surprised me while building it
Error scenarios were harder than happy paths
I added a --scenario flag so you can switch the whole sandbox to “auth_failure” mode and see what happens:
fetchsandbox run stripe accept_payment --scenario auth_failure
# ✗ Step 1: POST /v1/payment_intents → 401 Unauthorized
# Scenario "auth_failure" correctly caused failure.
# Scenario reset to default.My code had a bug where it didn’t handle 401 on the payment‑intent endpoint — only on the customer endpoint. I would never have caught that with a regular mock.
Webhooks were a rabbit hole
In a real Stripe integration, half the logic lives in webhook handlers. The sandbox now fires webhook events when resources mutate, and you can watch them in real‑time:
fetchsandbox webhook-listen stripe
# 12:04:31 payment_intent.created pi_xyz → requires_confirmation
# 12:04:32 payment_intent.succeeded pi_xyz → succeededInspecting state is underrated
After running a workflow, you can see exactly what’s in the sandbox:
fetchsandbox state stripe customers
# customers — 3 records
# ┌──────────────┬─────────────────┬──────────┐
# │ id │ email │ status │
# ├──────────────┼─────────────────┼──────────┤
# │ cust_abc123 │ test@acme.com │ active │
# └──────────────┴─────────────────┴──────────┘How it compares to the alternatives
| Feature | Mock server (Prism) | Vendor sandbox (Stripe test mode) | FetchSandbox |
|---|---|---|---|
| Setup time | 1 min | 15‑30 min (account + keys) | Curious what other people’s testing setups look like for third‑party APIs. |
Do you mock everything? Use vendor test modes? Some hybrid? Drop a comment — I’ve been deep in this problem for months and I’m still learning.
