I built a Serverless OpenAI Gateway to cut costs by 30% and sanitize PII (Open Source)
Source: Dev.to
If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):
- Redundant API costs – users ask the same questions repeatedly, forcing you to pay OpenAI for the same tokens over and over.
- Compliance anxiety – a user might paste a client’s name, email, or tax ID into your chatbot, which then gets sent to a third‑party server (OpenAI, DeepSeek, etc.).
Most existing solutions are heavy enterprise gateways (Java/Docker) or expensive SaaS offerings. I decided to engineer a lightweight, server‑less alternative that runs entirely on the Edge with Cloudflare Workers.
Solution Overview
Sanitiza.AI is an open‑source gateway that caches requests and scrubs PII before they leave your network. The goal was zero DevOps: no Docker containers, no Redis instances—just pure serverless functions.
Runtime & Stack
| Component | Choice |
|---|---|
| Runtime | Cloudflare Workers (TypeScript) |
| Framework | Hono (lightweight web framework, similar to Express) |
| Storage | Cloudflare KV (key‑value store for caching) |
| Hashing | Native Web Crypto API (SHA‑256) |
Smart Cache (for RAG apps)
Redundancy is huge in Retrieval‑Augmented Generation (RAG) workloads. The gateway creates a SHA‑256 hash of the request body (prompt + system instructions) and uses it as a cache key.
// Generate a unique fingerprint for the request
async function generateHash(message: string): Promise {
const msgBuffer = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
Cache flow
| Result | Action | Cost |
|---|---|---|
| Hit | Return the stored JSON instantly ( Live calculator: see the repository for a working demo.) |
Performance on the Edge
| Metric | Value |
|---|---|
| Cold start | ~0 ms (practically instantaneous) |
| Cache response time |
It works with OpenAI, DeepSeek, Groq, and any other compatible API.
Contributing
I’m looking for contributors to implement semantic caching (using Cloudflare Vectorize) to catch prompts that are similar but not identical. If you have experience with Rust/WASM or vector databases, let’s talk!
If you find the project useful, please ⭐ the repository—it helps a lot!