Your Secrets Stay Local: Building a Privacy-First Mental Health AI with WebLLM and WebGPU
Source: Dev.to
In the era of massive cloud‑based LLMs, privacy remains the “elephant in the room.” This is especially true for mental‑health and psychological‑counseling applications, where user data isn’t just “personal”—it’s deeply sensitive. Sending a transcript of a therapy session to a third‑party API can feel like a breach of trust.
But what if the AI lived entirely inside the user’s browser? 🤯
Today we dive into WebLLM sentiment analysis and privacy‑first AI engineering. By leveraging WebGPU‑based local LLM capabilities, we can build a sentiment‑analysis engine for counseling that runs at near‑native speeds—without a single byte of text ever leaving the client’s machine.
The Architecture: 100 % Client‑Side Inference
Traditional AI apps act as thin clients for a heavy backend. Our approach flips the script. Using TVM.js and WebGPU, we turn the browser into a high‑performance inference engine.
graph TD
User((User Input)) --> ReactUI[React Frontend]
ReactUI --> EngineInit{Engine Initialized?}
EngineInit -- No --> WebLLM[WebLLM / TVM.js Runtime]
WebLLM --> ModelCache[(IndexedDB Model Cache)]
ModelCache --> WebLLM
EngineInit -- Yes --> LocalInference[Local WebGPU Inference]
LocalInference --> SentimentOutput[Sentiment Analysis Result]
SentimentOutput --> ReactUI
subgraph Browser Sandbox
WebLLM
ModelCache
LocalInference
end
Prerequisites
To follow this intermediate‑level tutorial you’ll need:
- React (Vite is recommended)
- WebLLM SDK – the bridge between the browser and LLMs
- WebGPU‑compatible browser – latest Chrome or Edge
- A decent GPU – even integrated chips work wonders with WebGPU
Step 1: Setting Up the WebLLM Engine
First, install the SDK:
npm install @mlc-ai/web-llm
The core of our privacy‑preserving app is the Engine. We’ll initialize it and load a quantized model (e.g., Llama‑3 or Mistral) optimized for web execution.
import { CreateWebWorkerEngine, ChatModule } from "@mlc-ai/web-llm";
import { useState } from "react";
// Custom hook to manage the LLM lifecycle
export function useLocalLLM() {
const [engine, setEngine] = useState(null);
const [loadingProgress, setLoadingProgress] = useState(0);
const initEngine = async () => {
// Use a WebWorker to keep the UI thread buttery smooth 🧈
const worker = new Worker(
new URL("./worker.ts", import.meta.url),
{ type: "module" }
);
const engine = await CreateWebWorkerEngine(
worker,
"Llama-3-8B-Instruct-v0.1-q4f16_1-MLC",
{
initProgressCallback: (report) => {
setLoadingProgress(Math.round(report.progress * 100));
},
}
);
setEngine(engine);
};
return { engine, loadingProgress, initEngine };
}
Step 2: The “Counselor” Prompt Engineering
For psychological sentiment analysis we need more nuance than a simple “Positive/Negative.” The system prompt stays entirely in the browser’s memory.
const SYSTEM_PROMPT = `
You are a local, privacy‑focused mental health assistant.
Analyze the user's input for emotional tone, cognitive distortions, and sentiment.
Provide a structured JSON output with the following keys:
- sentiment: (String: 'Calm', 'Anxious', 'Depressed', 'Joyful')
- intensity: (Number: 1‑10)
- feedback: (String: A supportive, empathetic response)
IMPORTANT: Do not suggest medical diagnoses.
`;
const analyzeSentiment = async (engine: ChatModule, userInput: string) => {
const messages = [
{ role: "system", content: SYSTEM_PROMPT },
{ role: "user", content: userInput },
];
const reply = await engine.chat.completions.create({
messages,
temperature: 0.7,
// Ensure the model outputs JSON
response_format: { type: "json_object" },
});
return JSON.parse(reply.choices[0].message.content);
};
The “Official” Way to Scale
Building local‑first apps is empowering, but productionizing these patterns requires deeper knowledge of edge computing and data synchronization. For advanced architectural patterns and production‑ready examples of private AI systems, check out the technical deep‑dives at WellAlly Blog. Topics include optimized model quantization and secure local‑storage strategies that complement the WebLLM workflow.
Step 3: Integrating with React
Finally, we build a simple UI where users can vent, knowing their data is “air‑gapped” by the browser sandbox.
function SentimentApp() {
const { engine, loadingProgress, initEngine } = useLocalLLM();
const [input, setInput] = useState("");
const [result, setResult] = useState<any>(null);
const handleAnalyze = async () => {
if (!engine) return;
const analysis = await analyzeSentiment(engine, input);
setResult(analysis);
};
return (
<div className="p-4">
<h1 className="text-xl font-bold mb-4">
SafeSpace: Local AI Counseling 🛡️
</h1>
{!engine ? (
<button
onClick={initEngine}
className="bg-blue-500 text-white px-4 py-2 rounded"
>
Load Local Model ({loadingProgress}%)
</button>
) : (
<>
<textarea
value={input}
onChange={(e) => setInput(e.target.value)}
rows={6}
className="w-full border rounded p-2 mb-2"
placeholder="Enter your thoughts..."
/>
<button
onClick={handleAnalyze}
className="bg-green-600 text-white px-4 py-2 rounded"
>
Analyze Sentiment
</button>
{result && (
<div className="mt-4 p-4 border rounded bg-gray-50">
<p><strong>Sentiment:</strong> {result.sentiment}</p>
<p><strong>Intensity:</strong> {result.intensity}/10</p>
<p><strong>Feedback:</strong> {result.feedback}</p>
</div>
)}
</>
)}
</div>
);
}
Additional snippets from the original article
<>
setResult(await analyzeSentiment(engine, input))}
className="mt-2 bg-green-600 text-white px-4 py-2 rounded"
>
Analyze Privately
{result && (
### Analysis (Stayed in Browser ✅)
**Sentiment:** {result.sentiment}
"{result.feedback}"
)}
Why This Matters
- Zero Latency (Post‑Load): Once the model is cached in IndexedDB (a feature of TVM.js), inference happens at the speed of the user’s hardware.
- Cost Efficiency: You aren’t paying $0.01 per 1k tokens to OpenAI. The user provides the compute! 🥑
- Trust: For apps dealing with trauma, addiction, or grief, being able to prove that “we literally cannot see your data” is a massive competitive advantage.
Conclusion
WebLLM and WebGPU are turning browsers into powerful AI workstations. By moving the “brain” to the client, we solve the ultimate privacy paradox in mental‑health tech.
Are you ready to move your inference to the edge? Drop a comment below if you’ve experimented with WebGPU or if you have questions about model quantization!
Keep coding, keep building, and stay private. 🚀
For more advanced guides on building secure, high‑performance web applications, don’t forget to visit the WellAlly Blog.