National Vaccine Appointment & Administration System
Source: Dev.to
Few years ago, I had a system‑design interview. The interviewer gave me this scenario:
Design a national vaccine appointment booking system.
Millions of citizens need to register and book slots. Clinics must administer the doses. The government needs audit logs and fraud prevention.
My first thought was simple: just let people book a slot, check the stock, and confirm. I drew a basic flow on the whiteboard and felt pretty good about it. Then the interviewer started asking harder questions.
- “What if two people try to book the last slot at the same time?”
- “What if the clinic runs out of doses after the booking is already confirmed?”
- “How do you undo things if eligibility check fails in the middle?”
I didn’t have good answers. I only designed for the happy path.
That interview stuck in my mind. Months later, while researching inventory‑reservation patterns for an internet‑credit‑purchase system, I realized the same ideas could have helped me in that interview. So I went back to the problem and redesigned it. This is what I came up with.
What I originally proposed (during the interview)
User selects clinic → slot → vaccine type → system confirms → appointment created
Problems that appear quickly
| Problem | Description |
|---|---|
| Race conditions | Two people click “Book” at the same time for the last slot. Both get confirmed → one citizen ends up with no seat. |
| Stock mismatch | Slot is confirmed, but the clinic runs out of vaccine doses between booking day and appointment day. |
| Late eligibility failure | System confirms appointment first, then discovers the citizen doesn’t meet age/insurance requirements. The slot/dose is already allocated. |
| No rollback | If something fails in the middle, there is no way to release the slot or dose back to the pool. |
These are the same problems I later found when designing the internet‑credit‑purchase system: the happy path is not enough when you deal with limited resources and many concurrent users.
Core Insight
Don’t confirm anything until everything is verified.
Use a multi‑stage process: temporary hold → verification → final confirmation. If any step fails, roll back.
It’s exactly how concert‑ticket sales work: a seat is held while you pay; if you don’t finish in time, the seat goes back.
Full Flow of the Improved Design
1. Create a Temporary Reservation
- User selects clinic, time slot, and vaccine type.
- System creates a temporary reservation in Redis with a TTL (e.g., 5 minutes).
- Appointment status =
PENDING. - Slot capacity and vaccine‑dose count are decreased temporarily so other users see reduced availability.
Why Redis?
- Fast, in‑memory, supports TTL out‑of‑the‑box.
- A relational DB could work, but you’d need a scheduled job to clean up expired reservations. Redis automatically expires keys.
2. Handle Race Conditions on Redis
- Use the atomic
DECRcommand on the slot counter. - If the counter reaches zero, the next request is rejected.
- For extra safety, wrap the check‑and‑decrement in a Lua script so it’s a single atomic operation.
3. Run Eligibility Checks (while the slot is held)
| Check | Description |
|---|---|
| Age | Some vaccines are only for 60+. |
| Insurance | Verify via external API. |
| Medical history | Allergies, previous doses, etc. |
| Geography | Citizen must belong to the right region. |
If any check fails:
- Delete the Redis reservation.
- Increment the temporary slot counter (release the slot).
- Return a clear error message (e.g., “You are not eligible because …”).
4. Final Confirmation (all checks passed)
- Persistently decrease slot capacity and vaccine stock in the main database.
- Update appointment status:
PENDING → CONFIRMED. - Delete the Redis reservation (no longer needed).
- Send confirmation to the citizen (SMS, email, push).
This is the point of no return. Everything before this step can be undone.
5. Arrival at the Clinic
- Staff scans the citizen’s QR code (contains appointment ID + verification hash).
- Server verifies the QR code against the appointment record.
- Staff records vaccine batch number and administration time.
- Appointment status →
ADMINISTERED. - Emit an event for analytics, government reporting, audit logs.
Failure‑Handling Scenarios
| Failure | Handling |
|---|---|
| No‑show | A scheduled job scans for CONFIRMED appointments whose time window has passed. Status → NO_SHOW; stock is released back. |
| Citizen cancels | Cancellation via portal releases stock immediately. |
| Clinic cancels a slot | All affected appointments are flagged, citizens are notified, and they can re‑book with priority. |
| External API down (e.g., insurance) | Use circuit‑breaker pattern. After N consecutive failures, stop calling the API temporarily. Booking is either queued for retry (exponential back‑off) or allowed provisionally with a flag for manual review. |
| Redis goes down | Fallback to database‑level reservations with a cleanup job. Slower, but booking still works. |
High‑Level Architecture
+-------------------+ +-------------------+ +-------------------+
| Frontend | ---> | API Gateway | ---> | Auth Service |
| (Booking portal | | (Auth, rate‑limit | | (Login, ID check)|
| & Clinic dashboard) | , routing) | +-------------------+
+-------------------+ +-------------------+ |
+-------------------+
| Booking Service |
| (Reservation, |
| eligibility, |
| confirmation) |
+-------------------+
|
+-------------------+----------------------+-------------------+
| | | |
+-------------------+ +-------------------+ +-------------------+ +-------------------+
| Redis Cache | | Relational DB | | External APIs | | Messaging / |
| (Temp holds, TTL) | | (Appointments, | | (Insurance, | | Event Bus |
| | | Stock, Logs) | | Medical, etc.) | | (Kafka, SNS…) |
+-------------------+ +-------------------+ +-------------------+ +-------------------+
- Frontend – Web/mobile portal for citizens; dashboard for clinic staff.
- API Gateway – Handles authentication, global rate limiting (crucial during mass booking), routing to micro‑services.
- Auth Service – National ID verification, token issuance.
- Booking Service – Core logic: temporary reservation, eligibility checks, final confirmation, cancellation handling.
- Redis Cache – Fast, TTL‑based temporary holds.
- Relational DB – Persistent storage of appointments, stock levels, audit logs.
- External APIs – Insurance verification, medical‑history lookup, etc.
- Messaging / Event Bus – Emits events for analytics, reporting, and eventual consistency across subsystems.
Takeaways
- Never trust the happy path when dealing with scarce resources.
- Atomic, temporary holds (Redis
DECR/Lua) prevent race conditions. - Multi‑stage workflow (PENDING → CONFIRMED → ADMINISTERED) gives clear rollback points.
- TTL‑based reservations automatically clean up abandoned attempts.
- Circuit breakers & fallbacks keep the system resilient when a downstream dependency fails.
Applying these inventory‑reservation patterns turned a naïve “book‑and‑confirm” design into a robust, production‑ready national vaccine‑booking system.
Services Overview
| Service | Responsibility |
|---|---|
| Patient Service | Medical records, vaccination history |
| Clinic Service | Slot management, staff schedules, capacity |
| Inventory Service | Vaccine stock per clinic, batch tracking |
| Appointment Service | Manages reservations, confirmations, and status changes |
| Eligibility Service | Rules engine + external API calls |
| Notification Service | SMS, email, push; retries if delivery fails |
| Audit Service | Append‑only logs for every status change (required for government compliance) |
Data Layer
- PostgreSQL – permanent data storage
- Redis – temporary reservations and caching
Asynchronous Messaging
- Kafka topics for events:
AppointmentReserved
AppointmentConfirmed
AppointmentAdministered
AppointmentCancelled
These events keep services decoupled and make the system auditable by default.
Lessons Learned from the Interview
“Looking back at that interview, the biggest thing I missed was not about technology—it was about mindset. I jumped to the happy path because it felt complete. But the interviewer was not testing if I can design a booking form. They were testing if I can think about what happens when things go wrong.”
Key Takeaways
- Start with failure scenarios, not the happy path – Ask yourself “what can go wrong at each step?” before finalizing any design.
- Temporary reservation is a pattern, not a hack – Whether it’s concert tickets, flash sales, or vaccine slots, limited stock and many users demand a hold‑then‑confirm flow.
- Be explicit about rollbacks – “We’ll handle errors” is not a design. Specify what happens to the data, the stock, and the user when something fails.
- Plan for external service outages – Insurance APIs or notification services can go down. Circuit breakers and retry queues are not optional; they are necessary.
- Study inventory reservation patterns – My earlier post on designing an internet credit purchase system covers these patterns with more detail and code examples. The core idea – reserve first, verify, then commit – appears in many systems once you start looking.
Call for Feedback
Thanks for reading. If you faced similar interview questions or have ideas to improve this design, I would like to hear about it in the comments.