Glin Profanity：内容审核实用工具包

发布: 1个月前 (2025年12月31日 GMT+8 07:48)

8 分钟阅读

原文: Dev.to

Source: Dev.to

（未提供需要翻译的文本。）

什么是 Glin‑Profanity？

Glin‑Profanity 是一个面向 JavaScript/TypeScript 和 Python 的开源内容审核库。
与普通的词表过滤不同，它针对用户实际使用的规避技巧进行拦截：

Leetspeak 替换 – 例如 f4ck、5h1t
Unicode 同形异义字 – 看起来像拉丁字母的西里尔字符
字符分隔技巧

关键功能

Leetspeak 与 Unicode 归一化（可捕获 @$$、fսck、sh!t）
内置 23 种语言词典
可选的基于 TensorFlow.js / TensorFlow 的机器学习毒性检测
通过 LRU 缓存实现 2100 万+ 次/秒的处理速度
支持 Node.js、浏览器以及 Python 环境

现场试用

在浏览器中直接测试过滤器 — 无需安装。

打开交互式演示

快速参考

功能	JavaScript / TypeScript	Python
安装	`npm install glin-profanity`	`pip install glin-profanity`
语言	支持 23 种	支持 23 种
性能	21 M ops / sec	原生 C 扩展
机器学习支持	TensorFlow.js	TensorFlow
包大小	~45 KB（可树摇）	N/A

安装

JavaScript / TypeScript

npm install glin-profanity

Python

pip install glin-profanity

可选的机器学习毒性支持（仅限 JavaScript）

npm install glin-profanity @tensorflow/tfjs

代码模板

模板 1 – 基础脏话检测

JavaScript

import { checkProfanity } from 'glin-profanity';

const result = checkProfanity('user input here', {
  languages: ['english']
});

if (result.containsProfanity) {
  console.log('Blocked words:', result.profaneWords);
}

Python

from glin_profanity import Filter

filter = Filter({"languages": ["english"]})
result = filter.check_profanity("user input here")

if result.contains_profanity:
    print(f"Blocked words: {result.profane_words}")

模板 2 – 俚语与Unicode规避检测

捕获: f4ck, 5h1t, @$$, fսck (Cyrillic), s.h" "i.t

import { Filter } from 'glin-profanity';

const filter = new Filter({
  detectLeetspeak: true,
  leetspeakLevel: 'aggressive', // 'basic' | 'moderate' | 'aggressive'
  normalizeUnicode: true
});

filter.isProfane('f4ck');   // true
filter.isProfane('5h1t');   // true
filter.isProfane('@$$');   // true
filter.isProfane('fսck');   // true (Cyrillic 'ս')

俚语等级

等级	描述
`basic`	常见替换（4→a，3→e，1→i，0→o）
`moderate`	+ 扩展符号（@→a，$→s，!→i）
`aggressive`	+ 分隔字符，混合模式

模板 3 – 多语言检测

// Detect a specific set of languages
const filter = new Filter({
  languages: ['english', 'spanish', 'french', 'german']
});

// Detect all supported languages
const filterAll = new Filter({ allLanguages: true });

支持的语言

arabic, chinese, czech, danish, dutch, english, esperanto,
finnish, french, german, hindi, hungarian, italian, japanese,
korean, norwegian, persian, polish, portuguese, russian,
spanish, swedish, thai, turkish

模板 4 – 自动替换脏话

const filter = new Filter({
  replaceWith: '***',
  detectLeetspeak: true
});

const result = filter.checkProfanity('What the f4ck');
console.log(result.processedText); // "What the ***"

自定义替换模式

// Asterisks matching word length
{ replaceWith: '*' }        // "f**k" → "****"

// Fixed replacement
{ replaceWith: '[FILTERED]' } // "f**k" → "[FILTERED]"

// Character‑based
{ replaceWith: '#' }        // "f**k" → "####"

模板 5 – 基于严重程度的审核

import { Filter, SeverityLevel } from 'glin-profanity';

const filter = new Filter({ detectLeetspeak: true });
const result = filter.checkProfanity(userInput);

switch (result.maxSeverity) {
  case SeverityLevel.HIGH:
    blockMessage(result);
    notifyModerators(result);
    break;
  case SeverityLevel.MEDIUM:
    sendFiltered(result.processedText);
    flagForReview(result);
    break;
  case SeverityLevel.LOW:
    sendFiltered(result.processedText);
    break;
  default:
    send(userInput);
}

模板 6 – 实时输入的React Hook

import { useProfanityChecker } from 'glin-profanity';

function ChatInput() {
  const { result, checkText, isChecking } = useProfanityChecker({
    detectLeetspeak: true,
    languages: ['english']
  });

  return (
    <div>
      <input
        type="text"
        onChange={e => checkText(e.target.value)}
        placeholder="Type a message..."
        disabled={isChecking}
      />
      {result?.containsProfanity && (
        <p style={{ color: 'red' }}>
          Please remove inappropriate language.
        </p>
      )}
    </div>
  );
}

模板 7 – 机器学习毒性检测 (v3+)

捕获不含显式脏话的有毒内容，例如：

“你是最差的玩家”
“这里没人想要你”
“快点退出吧”

import { loadToxicityModel, checkToxicity } from 'glin-profanity/ml';

// Load once on app startup
await loadToxicityModel({ threshold: 0.9 });

// Check any text
const result = await checkToxicity("You're terrible at this");
i

> **Source:** ...

```js
f (result.isToxic) {
  console.log('Toxic content detected');
}

示例输出

console.log(result);
// {
//   toxic: true,
//   categories: {
//     toxicity: 0.92,
//     insult: 0.87,
//     threat: 0.12,
//     identity_attack: 0.08,
//     obscene: 0.45
//   }
// }

注意： 该机器学习模型 100 % 本地运行。没有 API 调用，数据不会离开你的服务器。

模板 8 – 完整聊天审查管道

import { Filter, SeverityLevel } from 'glin-profanity';
import { loadToxicityModel, checkToxicity } from 'glin-profanity/ml';

// Setup
const filter = new Filter({
  languages: ['english', 'spanish'],
  detectLeetspeak: true,
  leetspeakLevel: 'moderate',
  normalizeUnicode: true,
  replaceWith: '***',
});

await loadToxicityModel({ threshold: 0.85 });

// Moderation function
async function moderateMessage(text) {
  // 1️⃣ 快速规则‑基检查
  const profanity = filter.checkProfanity(text);

  // 2️⃣ ML 毒性检查
  const toxicity = await checkToxicity(text);

  // 3️⃣ 决策逻辑
  if (profanity.maxSeverity === SeverityLevel.HIGH) {
    return { action: 'block', reason: 'severe_profanity' };
  }

  if (toxicity.toxic) {
    return {
      action: 'flag',
      text: profanity.processedText,
      reason: 'toxic_content',
    };
  }

  if (profanity.containsProfanity) {
    return { action: 'filter', text: profanity.processedText };
  }

  return { action: 'allow', text };
}

// Usage
const result = await moderateMessage('User message here');

模板 9 – Express.js 中间件

import { Filter } from 'glin-profanity';
import express from 'express';
import { commentHandler } from './handlers/commentHandler.js';
import { getNestedValue } from './utils/getNestedValue.js';

const app = express();

const filter = new Filter({
  detectLeetspeak: true,
  languages: ['english'],
});

function profanityMiddleware(req, res, next) {
  // 需要扫描的字段（点表示法）
  const fieldsToCheck = ['body.message', 'body.comment', 'body.bio'];

  for (const field of fieldsToCheck) {
    const value = getNestedValue(req, field);
    if (value && filter.isProfane(value)) {
      return res.status(400).json({
        error: 'Content contains inappropriate language',
      });
    }
  }

  next();
}

// 路由示例
app.post('/api/comments', profanityMiddleware, commentHandler);

工作原理

初始化 profanity 过滤器 – glin-profanity 被配置为检测 leetspeak 并使用英文词典。
定义中间件 – profanityMiddleware 遍历可能包含用户生成文本的字段列表。
提取嵌套值 – getNestedValue(req, field) 安全地从点表示法路径读取值（例如 req.body.message）。
检查 profanity – 如果任意字段包含不当语言，请求将以 400 Bad Request 响应被拒绝。
干净时继续 – 若未发现 profanity，next() 将控制权交给下一个处理函数（示例中的 commentHandler）。

将中间件添加到其他路由

app.put('/api/profile', profanityMiddleware, profileUpdateHandler);
app.post('/api/posts', profanityMiddleware, postCreateHandler);

工具函数：`getNestedValue`

// utils/getNestedValue.js
export function getNestedValue(obj, path) {
  return path.split('.').reduce((acc, key) => (acc ? acc[key] : undefined), obj);
}

模板 10 – 自定义白名单 / 黑名单

const filter = new Filter({
  languages: ['english'],
  ignoreWords: ['hell', 'damn'],   // 允许这些词
  customWords: ['badword', 'toxic'] // 添加自定义屏蔽词
});

架构

性能基准

操作	速度
简单检查	21 M ops/sec
使用俚语（中等）	8.5 M ops/sec
多语言（3 种语言）	18 M ops/sec
Unicode 正规化	15 M ops/sec

结果使用 LRU 策略进行缓存。

API 快速参考

过滤选项

interface FilterOptions {
  languages?: string[];               // ['english', 'spanish', …]
  allLanguages?: boolean;              // Check all 23 languages
  detectLeetspeak?: boolean;           // Enable leetspeak detection
  leetspeakLevel?: 'basic' | 'moderate' | 'aggressive';
  normalizeUnicode?: boolean;           // Handle Unicode homoglyphs
  replaceWith?: string;                 // Replacement character/string
  ignoreWords?: string[];              // Whitelist
  customWords?: string[];              // Additional blocked words
}

结果对象

interface CheckResult {
  containsProfanity: boolean;
  profaneWords: string[];
  processedText: string;   // Text after replacements are applied
  maxSeverity: SeverityLevel;
  matches: MatchDetail[];
}

资源

实时演示 – 链接
GitHub 仓库 – 链接
npm 包 – 链接
PyPI 包 – 链接
完整文档 – 链接

标签: javascript, typescript, python, react, opensource, webdev, contentmoderation, npm, profanityfilter

Glin Profanity：内容审核实用工具包

什么是 Glin‑Profanity？

关键功能

现场试用

快速参考

安装

JavaScript / TypeScript

Python

可选的机器学习毒性支持（仅限 JavaScript）

代码模板

模板 1 – 基础脏话检测

JavaScript

Python

模板 2 – 俚语与Unicode规避检测

俚语等级

模板 3 – 多语言检测

模板 4 – 自动替换脏话

自定义替换模式

模板 5 – 基于严重程度的审核

模板 6 – 实时输入的React Hook

模板 7 – 机器学习毒性检测 (v3+)

示例输出

模板 8 – 完整聊天审查管道

模板 9 – Express.js 中间件

工作原理

将中间件添加到其他路由

工具函数：`getNestedValue`

模板 10 – 自定义白名单 / 黑名单

架构

性能基准

API 快速参考

过滤选项

结果对象

资源

相关文章

我如何构建自己的 Python 全栈面试准备资源（并且你可以免费使用它们！）

GitPeek：用 Mux 将 GitHub 数据转化为故事🔥🔥🎬

🚀 SnapConvert – 一个快速、轻量级的适用于 Windows 的图像转换器（EXE + 源代码）

🍀 初学者友好指南 'Four Divisors' – LeetCode 1390 (C++, Python, JavaScript)

什么是 Glin‑Profanity？

关键功能

现场试用

快速参考

安装

JavaScript / TypeScript

Python

可选的机器学习毒性支持（仅限 JavaScript）

代码模板

模板 1 – 基础脏话检测

JavaScript

Python

模板 2 – 俚语与Unicode规避检测

俚语等级

模板 3 – 多语言检测

模板 4 – 自动替换脏话

自定义替换模式

模板 5 – 基于严重程度的审核

模板 6 – 实时输入的React Hook

模板 7 – 机器学习毒性检测 (v3+)

示例输出

模板 8 – 完整聊天审查管道

模板 9 – Express.js 中间件

工作原理

将中间件添加到其他路由

工具函数：getNestedValue

模板 10 – 自定义白名单 / 黑名单

架构

性能基准

API 快速参考

过滤选项

结果对象

资源

相关文章

我如何构建自己的 Python 全栈面试准备资源（并且你可以免费使用它们！）

GitPeek：用 Mux 将 GitHub 数据转化为故事🔥🔥🎬

🚀 SnapConvert – 一个快速、轻量级的适用于 Windows 的图像转换器（EXE + 源代码）

🍀 初学者友好指南 'Four Divisors' – LeetCode 1390 (C++, Python, JavaScript)

模板 1 – 基础脏话检测

模板 2 – 俚语与Unicode规避检测

模板 3 – 多语言检测

模板 4 – 自动替换脏话

模板 5 – 基于严重程度的审核

模板 6 – 实时输入的React Hook

模板 7 – 机器学习毒性检测 (v3+)

模板 8 – 完整聊天审查管道

模板 9 – Express.js 中间件

工具函数：`getNestedValue`

模板 10 – 自定义白名单 / 黑名单