The Architecture Behind a Stateless AI Application
Source: Dev.to
The project started with a risky‑looking decision: no backend database.
At the time there was no need to persist user data—getting the user’s response was the priority. Most tutorials assume you’ll store accounts, sessions, and data in PostgreSQL, MongoDB, DynamoDB, etc., but this app doesn’t need to persist anything across devices.
The Three‑Layer Split

- Frontend – Handles all user interaction, UI state, image compression, multi‑step wizard flow, local history, and result rendering.
- Backend – One responsibility: transform image data into diagnosis data. It receives a request, builds a prompt, calls the LLM, parses the response, and returns structured JSON. No state, sessions, or database.
- AI Layer – Claude Vision receives the images with crafted prompts and returns detailed diagnostic information.
Each layer does exactly one thing. Mixing responsibilities (e.g., storing history in the backend or calling the LLM directly from the frontend) adds unnecessary complexity.
Data Interaction

- History and settings never leave the user’s device.
- The API key passes through the server but is never stored.
The Multi‑Step Wizard: Why State Machines
The scan flow can include five potential steps:
- Plant part selection
- Crop type selection
- Media upload
- Analysis mode (single vs. multiple images)
- Context entry
Traditional numbered steps become ambiguous because the presence of step 4 depends on runtime conditions (single vs. multiple images).
Solution: a state machine with meaningful state names (part, crop, media, mode, context, analyzing). The UI renders based on the current state, and the progress indicator is computed dynamically, keeping the user experience accurate without hard‑coded step numbers.

Storage Architecture: Three Tiers

| Tier | Purpose | Typical Content |
|---|---|---|
| Session storage | Holds consent flags that expire when the browser closes. | Consent choices for health‑related data. |
| Local storage | Persists data across sessions. | Scan history, accessibility settings (font size, voice prefs), API key. |
| Embedded deep cache | Stores the full diagnosis for each history item (all 25+ fields). | Treatments, prevention tips, full result payload. |
The deep cache increases storage size but enables true offline access—critical for rural users. A typical scan is 20–30 KB; with a maximum of ~50 scans the total is ~1.5 MB, well under the 5 MB browser quota. Older scans rotate out automatically.
The Single Endpoint Philosophy
The backend exposes only one API endpoint:
POST /api/v1/analyze

All analysis modes (single image, batch, video) are handled by a mode parameter, which adjusts prompt construction and response handling. This avoids:
- Duplicate validation logic
- Complex client‑side routing
- Versioning headaches for multiple endpoints
- Extra documentation overhead
A single, well‑documented endpoint is simpler to test and maintain.
User‑Provided API Key

- Cost transparency: Users see exactly what they’re paying; no hidden markup.
- No key management: No database needed for storing or rotating keys, reducing operational complexity.
- Scalability: Each user has their own Anthropic quota, eliminating shared rate limits.
- Trust: Users retain control of their credentials; the developer cannot incur unexpected charges.
The trade‑off is friction—users must create an Anthropic account and generate an API key before using the app. This is acceptable for a technical audience but would need reconsideration for mass‑market adoption.
Batch vs. Single Mode: The Same Plant Problem
When multiple images are uploaded, the system must decide whether they represent different plants (analyze separately) or the same plant from different angles (analyze together).
(Further details on the implementation of this decision logic continue in the original article.)