How We Built an AI SaaS on the Edge for Nearly $0 in Infrastructure Costs
Source: Dev.to
- Introduction
A few months ago, we started building Propoza, a tool that generates business proposals with AI for Brazilian freelancers and small business owners.
The problem was practical: freelancers spend hours crafting proposals in Canva or Word, delivering generic documents that don’t protect the project scope. The tool needed to be simple — the user describes the service, the AI structures a complete proposal with scope, timeline, and payment terms.
The technical challenge: how do you run a freemium SaaS for Brazilian freelancers without blowing your budget before you have revenue?
A lean SaaS or micro SaaS typically costs between $5 and $20 per month before your first paying customer — just for server, managed database, and CDN. For a validation-phase product, that’s burned money.
We decided to try a different route: build everything on the edge, with serverless infrastructure and zero servers to manage. Here’s what worked, the trade-offs, and what we learned.
- The problem with the traditional SaaS stack
Before choosing our stack, we calculated the minimum cost of a Brazilian SaaS running on conventional infrastructure:
Component Typical Provider Estimated Monthly Cost
Server DigitalOcean / AWS EC2 $8–$30
Database Managed PostgreSQL (RDS, Supabase) $10–$40
CDN Cloudflare (paid plan) or similar $5–$20
Domain Namecheap / Porkbun ~$1/month
Total
$10–$90/month
That’s not unreasonable for an established SaaS. But for a product still validating its market fit with zero paying customers, it means burning $100–500 before you even know if the model works.
There’s another Brazil-specific problem: latency. Servers concentrated in São Paulo and Rio de Janeiro leave users in the North and Northeast with poor experiences, especially on mobile connections. A CDN helps, but adds cost.
We needed something that:
-
Cost near $0 in the first few months
-
Offered low latency across all of Brazil
Scaled without manual intervention
That’s when we looked at the edge serverless model.
- The stack we chose (and why)
3.1 Cloudflare Workers as the edge runtime
The application’s backbone is Cloudflare Workers — a serverless runtime that executes code across 330+ data centers worldwide, including points of presence in Brazil (São Paulo, Rio de Janeiro, Fortaleza).
-
Code runs in the data center closest to the user. For Brazilian users, latency stays under 50ms.
-
Workers keeps a warm runtime, unlike Lambda functions that freeze between executions. The first request is as fast as subsequent ones.
-
If the product goes viral and 10,000 people access it simultaneously, Workers distributes automatically.
The trade-off: Workers is not Node.js. There’s no filesystem access, native WebSocket, or Node standard library. It’s an isolated V8 runtime. Most things you need exist as native APIs (fetch, Web Crypto, streams), but libraries depending on fs or net won’t work.
3.2 Hono as the HTTP framework
We needed an HTTP framework that ran inside Workers with zero overhead.
Most popular Node.js frameworks (Express, Fastify, Koa) were designed for full Node.js environments and have compatibility issues. They depend on the http module, use synchronous APIs, or carry middleware too heavy for the serverless model.
Hono sidesteps all that. It’s under 14KB, runs natively on Workers, Deno, Bun, and Node.js, and is TypeScript-first with strong type inference. It supports middleware, route params, and Zod validation.
Here’s a proposal generation route:
import { Hono } from 'hono'
import { z } from 'zod'
import { zValidator } from '@hono/zod-validator'
const app = new Hono()
const proposalSchema = z.object({
clientName: z.string().min(1),
projectDescription: z.string().min(10),
deliverables: z.array(z.string()).min(1),
deadline: z.string(),
paymentMethod: z.string()
})
app.post('/api/proposals/generate', zValidator('json', proposalSchema), async (c) => {
const data = c.req.valid('json')
const result = await generateProposal(data)
return c.json({ proposal: result })
})
export default app
Enter fullscreen mode
Exit fullscreen mode
Validation with Zod at the edge of the request, combined with Hono’s inferred types, eliminates several runtime bugs without adding processing overhead.
3.3 Database on the edge
The database choice was the hardest decision. For an application that needs to persist proposals, users, and settings, the relational model is still the most natural fit.
We examined two options in the edge ecosystem:
D1 (Cloudflare): Distributed, managed SQLite. Queries go directly to the Worker’s storage, no connection pools.
Turso: Distributed SQLite with per-region replication.
We went with D1 for its native integration with Workers. You define the schema locally, run migrations with wrangler d1 migrations apply, and queries run in the same data center as the Worker.
CREATE TABLE proposals (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
client_name TEXT NOT NULL,
content TEXT NOT NULL,
status TEXT DEFAULT 'draft',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE users (
id TEXT PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
proposals_count INTEGER DEFAULT 0,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode
Exit fullscreen mode
Issues we ran into:
-
D1 doesn’t support all SQLite queries.
ALTER TABLEwith complex constraints,RETURNING, and some window functions aren’t available. Always test withwrangler devbefore deploying. -
Write latency is higher than read latency. The free plan prioritizes eventual consistency — fine for business proposals.
-
Migrations work, but altering tables with lots of data requires planning.
-
Heavy write access sometimes needs distribution via queues or KV-backed caching.
3.4 Cloudflare AI Gateway
The most expensive part of an AI SaaS isn’t infrastructure — it’s the LLM API. Each call costs fractions of a penny, but hundreds of calls per day add up fast.
The AI Gateway acts as a proxy between the Worker and the LLM API:
Response caching: if two users generate proposals with similar context, the response comes from cache.
Per-user rate limiting: on the free plan, we limit to 5 proposals/month. The gateway applies this with zero extra code.
Observability: latency, tokens, and error rates are logged automatically.
The request flow:
Request -> Worker -> AI Gateway -> Cache hit? Return cached response
-> Cache miss? LLM API -> Cache the response -> Return
Enter fullscreen mode
Exit fullscreen mode
Caching cut about 40% of actual LLM calls in the first few weeks. Many users test with similar inputs.
3.5 Frontend: React + Vite + Cloudflare Pages
Frontend with React + Vite, hosted on Cloudflare Pages. The free tier covers 500 builds/month and unlimited bandwidth for static sites.
Communication with the API is via typed fetch, leveraging the fact that both Worker and frontend share the same TypeScript types:
// shared types
interface GenerateRequest {
clientName: string
projectDescription: string
deliverables: string[]
deadline: string
paymentMethod: string
}
// client-side call
const response = await fetch('/api/proposals/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(request)
})
Enter fullscreen mode
Exit fullscreen mode
One detail that made a difference: PDFs are generated client-side, using libraries like @react-pdf/renderer or html2canvas + jspdf. The Worker never allocates memory or CPU for PDF rendering — the user’s browser handles it.
- The real cost
After a few weeks in production with dozens of active users:
Component Service Cost/Month
API / Backend Cloudflare Workers (Free Tier) $0
Database Cloudflare D1 (Free Tier) $0
Frontend / Hosting Cloudflare Pages (Free Tier) $0
AI (LLM) External API (cached via AI Gateway) ~$1–$4
Domain Namecheap / Porkbun ~$1/month
Total
< $5/month
The only variable cost is the LLM. It scales with actual usage, not with registered users. If someone signs up but never generates a proposal, the cost is zero.
This isn’t forever. When we scale to thousands of users, the Workers free tier will need an upgrade ($5+/month after 100k requests/day), and D1 may need a paid plan. But for PMF validation, it buys you months of experimentation.
- What we learned
Five things we’d take to the next project:
1. Remote debugging on Workers is harder. Wrangler tail helps.
There’s no SSH into a Worker. wrangler tail streams live production logs. Pair it with structured logging (JSON with requestId, userId, latency).
2. Secrets, vars, and bindings have nuances.
API keys go in secrets. Public config in vars. Resources like D1 or KV go in bindings (direct runtime references). Secrets aren’t accessible via wrangler dev without a .dev.vars file.
3. D1 doesn’t accept everything SQLite accepts.
A migration that worked locally broke on D1 because it used ALTER TABLE ... ADD COLUMN with constraints D1 rejects. Test every migration with wrangler d1 migrations apply --local.
4. AI Gateway saved us instrumentation work.
We didn’t need to implement token logging, caching, or rate limiting manually. The gateway delivers everything through environment variables. Saved days of work.
5. Hono + Zod is a solid combo for secure edge APIs.
Validation at the request edge with auto-inferred types eliminates runtime errors without adding noticeable latency.
- Conclusion
Building on the edge with Cloudflare Workers let us validate the product without the fixed costs of a traditional SaaS. The stack is lean, deployment is wrangler deploy, and the infra bill doesn’t scare you.
Is this for every SaaS? No. If you need heavy computation, complex queues, or a database with all the features of PostgreSQL, the edge stack has limitations. But for validation in markets where every dollar counts, it works.
The tool we built is called Propoza — an AI-powered proposal generator for freelancers. It’s free to use.
If you’ve used this stack or have a different approach to low-cost SaaS infrastructure, I’d love to hear about it in the comments. The goal here is to share experience, not to sell technology.