I built a Serverless OpenAI Gateway to cut costs by 30% and sanitize PII (Open Source)

Published: (January 30, 2026 at 08:14 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):

  • Redundant API costs – users ask the same questions repeatedly, forcing you to pay OpenAI for the same tokens over and over.
  • Compliance anxiety – a user might paste a client’s name, email, or tax ID into your chatbot, which then gets sent to a third‑party server (OpenAI, DeepSeek, etc.).

Most existing solutions are heavy enterprise gateways (Java/Docker) or expensive SaaS offerings. I decided to engineer a lightweight, server‑less alternative that runs entirely on the Edge with Cloudflare Workers.

Solution Overview

Sanitiza.AI is an open‑source gateway that caches requests and scrubs PII before they leave your network. The goal was zero DevOps: no Docker containers, no Redis instances—just pure serverless functions.

Runtime & Stack

ComponentChoice
RuntimeCloudflare Workers (TypeScript)
FrameworkHono (lightweight web framework, similar to Express)
StorageCloudflare KV (key‑value store for caching)
HashingNative Web Crypto API (SHA‑256)

Smart Cache (for RAG apps)

Redundancy is huge in Retrieval‑Augmented Generation (RAG) workloads. The gateway creates a SHA‑256 hash of the request body (prompt + system instructions) and uses it as a cache key.

// Generate a unique fingerprint for the request
async function generateHash(message: string): Promise {
  const msgBuffer = new TextEncoder().encode(message);
  const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}

Cache flow

ResultActionCost
HitReturn the stored JSON instantly ( Live calculator: see the repository for a working demo.)

Performance on the Edge

MetricValue
Cold start~0 ms (practically instantaneous)
Cache response time

It works with OpenAI, DeepSeek, Groq, and any other compatible API.

Contributing

I’m looking for contributors to implement semantic caching (using Cloudflare Vectorize) to catch prompts that are similar but not identical. If you have experience with Rust/WASM or vector databases, let’s talk!

If you find the project useful, please ⭐ the repository—it helps a lot!

Back to Blog

Related posts

Read more »