你的秘密保留本地：构建以隐私为先的心理健康 AI，使用 WebLLM 和 WebGPU

发布: 2个月前 (2026年2月15日 GMT+8 09:15)

7 分钟阅读

原文: Dev.to

Source: Dev.to

在大规模基于云的 LLM 时代，隐私仍然是“房间里的大象”。这在心理健康和心理咨询应用中尤为突出，因为用户数据不仅仅是“个人”的——它极其敏感。将一次治疗会话的文字记录发送给第三方 API 会让人感觉信任被破坏。

但如果 AI 完全运行在用户的浏览器里会怎样？ 🤯

今天我们深入探讨 WebLLM 情感分析 和 隐私优先的 AI 工程。通过利用 基于 WebGPU 的本地 LLM 能力，我们可以构建一个用于咨询的情感分析引擎，几乎以原生速度运行——而且文本内容丝毫不离开客户端机器。

架构：100 % 客户端推理

传统的 AI 应用通常充当沉重后端的轻量客户端。我们的做法则是把这一思路颠倒过来。借助 TVM.js 与 WebGPU，我们将浏览器转变为高性能的推理引擎。

graph TD
    User((User Input)) --> ReactUI[React Frontend]
    ReactUI --> EngineInit{Engine Initialized?}
    EngineInit -- No --> WebLLM[WebLLM / TVM.js Runtime]
    WebLLM --> ModelCache[(IndexedDB Model Cache)]
    ModelCache --> WebLLM
    EngineInit -- Yes --> LocalInference[Local WebGPU Inference]
    LocalInference --> SentimentOutput[Sentiment Analysis Result]
    SentimentOutput --> ReactUI
    subgraph Browser Sandbox
        WebLLM
        ModelCache
        LocalInference
    end

前置条件

React（推荐使用 Vite）
WebLLM SDK – 浏览器与大型语言模型之间的桥梁
兼容 WebGPU 的浏览器 – 最新的 Chrome 或 Edge
性能不错的 GPU – 即使是集成显卡在 WebGPU 下也能发挥奇效

第 1 步：设置 WebLLM 引擎

首先，安装 SDK：

npm install @mlc-ai/web-llm

我们隐私保护应用的核心是 Engine。我们将初始化它并加载一个已量化的模型（例如 Llama‑3 或 Mistral），该模型针对网页执行进行了优化。

import { CreateWebWorkerEngine, ChatModule } from "@mlc-ai/web-llm";
import { useState } from "react";

// Custom hook to manage the LLM lifecycle
export function useLocalLLM() {
  const [engine, setEngine] = useState(null);
  const [loadingProgress, setLoadingProgress] = useState(0);

  const initEngine = async () => {
    // Use a WebWorker to keep the UI thread buttery smooth 🧈
    const worker = new Worker(
      new URL("./worker.ts", import.meta.url),
      { type: "module" }
    );

    const engine = await CreateWebWorkerEngine(
      worker,
      "Llama-3-8B-Instruct-v0.1-q4f16_1-MLC",
      {
        initProgressCallback: (report) => {
          setLoadingProgress(Math.round(report.progress * 100));
        },
      }
    );
    setEngine(engine);
  };

  return { engine, loadingProgress, initEngine };
}

Step 2: “顾问”提示工程

对于心理情感分析，我们需要比简单的“正面/负面”更细致的区分。系统提示完全保存在浏览器的内存中。

const SYSTEM_PROMPT = `
You are a local, privacy‑focused mental health assistant.
Analyze the user's input for emotional tone, cognitive distortions, and sentiment.
Provide a structured JSON output with the following keys:
- sentiment: (String: 'Calm', 'Anxious', 'Depressed', 'Joyful')
- intensity: (Number: 1‑10)
- feedback: (String: A supportive, empathetic response)

IMPORTANT: Do not suggest medical diagnoses.
`;

const analyzeSentiment = async (engine: ChatModule, userInput: string) => {
  const messages = [
    { role: "system", content: SYSTEM_PROMPT },
    { role: "user", content: userInput },
  ];

  const reply = await engine.chat.completions.create({
    messages,
    temperature: 0.7,
    // Ensure the model outputs JSON
    response_format: { type: "json_object" },
  });

  return JSON.parse(reply.choices[0].message.content);
};

“官方”扩展之路

构建本地优先的应用令人振奋，但将这些模式投入生产需要更深入的边缘计算和数据同步知识。欲了解高级架构模式以及面向私有 AI 系统的生产就绪示例，请访问 WellAlly Blog 的技术深度解析。主题包括优化的模型量化以及与 WebLLM 工作流相辅相成的安全本地存储策略。

第 3 步：与 React 集成

最后，我们构建一个简易 UI，让用户可以倾诉，且知道他们的数据已通过浏览器沙箱实现“空气隔离”。

function SentimentApp() {
  const { engine, loadingProgress, initEngine } = useLocalLLM();
  const [input, setInput] = useState("");
  const [result, setResult] = useState<any>(null);

  const handleAnalyze = async () => {
    if (!engine) return;
    const analysis = await analyzeSentiment(engine, input);
    setResult(analysis);
  };

  return (
    <div className="p-4">
      <h1 className="text-xl font-bold mb-4">
        SafeSpace: Local AI Counseling 🛡️
      </h1>

      {!engine ? (
        <button
          onClick={initEngine}
          className="bg-blue-500 text-white px-4 py-2 rounded"
        >
          Load Local Model ({loadingProgress}%)
        </button>
      ) : (
        <>
          <textarea
            value={input}
            onChange={(e) => setInput(e.target.value)}
            rows={6}
            className="w-full border rounded p-2 mb-2"
            placeholder="Enter your thoughts..."
          />
          <button
            onClick={handleAnalyze}
            className="bg-green-600 text-white px-4 py-2 rounded"
          >
            Analyze Sentiment
          </button>

          {result && (
            <div className="mt-4 p-4 border rounded bg-gray-50">
              <p><strong>Sentiment:</strong> {result.sentiment}</p>
              <p><strong>Intensity:</strong> {result.intensity}/10</p>
              <p><strong>Feedback:</strong> {result.feedback}</p>
            </div>
          )}
        </>
      )}
    </div>
  );
}

原文中的其他代码片段

<>
   setResult(await analyzeSentiment(engine, input))}
    className="mt-2 bg-green-600 text-white px-4 py-2 rounded"
  >
    Analyze Privately

{result && (
  
    
### Analysis (Stayed in Browser ✅)

    
**Sentiment:** {result.sentiment}

    
"{result.feedback}"

  
)}

为什么这很重要

零延迟（加载后）：一旦模型被缓存到 IndexedDB（TVM.js 的功能），推理将在用户硬件的速度下进行。
成本效益：你不需要为每千个 token 向 OpenAI 支付 0.01 美元。计算资源由用户提供！ 🥑
信任：对于处理创伤、成瘾或悲痛的应用，能够证明“我们真的看不到你的数据”是巨大的竞争优势。

结论

WebLLM 和 WebGPU 正在把浏览器变成强大的 AI 工作站。通过将“大脑”移到客户端，我们解决了心理健康技术中的终极隐私悖论。

你准备好将推理迁移到边缘了吗？ 如果你已经尝试过 WebGPU，或对模型量化有疑问，请在下方留言！

保持编码，持续构建，保持隐私。🚀

想获取更多关于构建安全、高性能网页应用的高级指南，请别忘了访问 WellAlly Blog。

你的秘密保留本地：构建以隐私为先的心理健康 AI，使用 WebLLM 和 WebGPU

架构：100 % 客户端推理

前置条件

第 1 步：设置 WebLLM 引擎

Step 2: “顾问”提示工程

“官方”扩展之路

第 3 步：与 React 集成

原文中的其他代码片段

为什么这很重要

结论

相关文章

为什么你的 AI 编码代理成本呈指数增长（以及该如何应对）

让 Amazon Bedrock AgentCore 网关可访问（仅通过 CloudFront）

重新定义 Google Cloud 上的事件驱动架构

你的手机已经拥有能够证明照片真实的硬件，但没有人使用它。

架构：100 % 客户端推理

前置条件

第 1 步：设置 WebLLM 引擎

Step 2: “顾问”提示工程

“官方”扩展之路

第 3 步：与 React 集成

原文中的其他代码片段

为什么这很重要

结论

相关文章

为什么你的 AI 编码代理成本呈指数增长（以及该如何应对）

让 Amazon Bedrock AgentCore 网关可访问（仅通过 CloudFront）

重新定义 Google Cloud 上的事件驱动架构

你的手机已经拥有能够证明照片真实的硬件，但没有人使用它。

架构：100 % 客户端推理

第 1 步：设置 WebLLM 引擎

Step 2: “顾问”提示工程

第 3 步：与 React 集成