I built a Serverless OpenAI Gateway to cut costs by 30% and sanitize PII (Open Source)

Published: 3 months ago (January 30, 2026 at 08:14 PM EST)

2 min read

Source: Dev.to

Source: Dev.to

If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):

Redundant API costs – users ask the same questions repeatedly, forcing you to pay OpenAI for the same tokens over and over.
Compliance anxiety – a user might paste a client’s name, email, or tax ID into your chatbot, which then gets sent to a third‑party server (OpenAI, DeepSeek, etc.).

Most existing solutions are heavy enterprise gateways (Java/Docker) or expensive SaaS offerings. I decided to engineer a lightweight, server‑less alternative that runs entirely on the Edge with Cloudflare Workers.

Solution Overview

Sanitiza.AI is an open‑source gateway that caches requests and scrubs PII before they leave your network. The goal was zero DevOps: no Docker containers, no Redis instances—just pure serverless functions.

Runtime & Stack

Component	Choice
Runtime	Cloudflare Workers (TypeScript)
Framework	Hono (lightweight web framework, similar to Express)
Storage	Cloudflare KV (key‑value store for caching)
Hashing	Native Web Crypto API (SHA‑256)

Smart Cache (for RAG apps)

Redundancy is huge in Retrieval‑Augmented Generation (RAG) workloads. The gateway creates a SHA‑256 hash of the request body (prompt + system instructions) and uses it as a cache key.

// Generate a unique fingerprint for the request
async function generateHash(message: string): Promise {
  const msgBuffer = new TextEncoder().encode(message);
  const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}

Cache flow

Result	Action	Cost
Hit	Return the stored JSON instantly ( Live calculator: see the repository for a working demo.)

Performance on the Edge

Metric	Value
Cold start	~0 ms (practically instantaneous)
Cache response time

It works with OpenAI, DeepSeek, Groq, and any other compatible API.

Contributing

I’m looking for contributors to implement semantic caching (using Cloudflare Vectorize) to catch prompts that are similar but not identical. If you have experience with Rust/WASM or vector databases, let’s talk!

If you find the project useful, please ⭐ the repository—it helps a lot!

I built a Serverless OpenAI Gateway to cut costs by 30% and sanitize PII (Open Source)

Solution Overview

Runtime & Stack

Smart Cache (for RAG apps)

Performance on the Edge

Contributing

Related posts

Introducing nono: A Secure Sandbox for AI Agents

Switch Claude Code providers in seconds with claude-provider (Plugin + CLI)

How to Set Up OpenClaw in 5-10 Minutes (No Mac Mini, No VPS, No Code)

Debugging My Brain: Why Procrastination is Actually an 'Emotional Regulation' Glitch