我让 AI 写我的博客文章，然后为其质量打分——结果残酷

发布: 2天前 (2026年3月8日 GMT+8 12:32)

6 分钟阅读

Source: Dev.to

我写的东西很多——博客文章、文档、README——大约每周 2,000 字。上个月，我让 AI 生成了几段博客内容。它们看起来相当不错，甚至很专业，但总觉得有点不对劲，于是我把它们交给可读性评分器。结果数字非常残酷。

实验

我挑选了四段由 AI 生成的博客段落（比如在 ChatGPT/Claude 中输入“写一篇关于网页开发的博客开头”时产生的内容），以及四段我自己写的段落。随后，我使用 textlens——一个开源文本分析库，对这八段文字全部进行评分。

评分代码（JavaScript）：

import { readability, sentiment } from 'textlens';

const aiText = `In today's rapidly evolving technological landscape,
developers are constantly seeking innovative solutions to streamline
their workflows and enhance productivity. The emergence of artificial
intelligence has fundamentally transformed the way we approach
software development, offering unprecedented opportunities for
automation and optimization.`;

const humanText = `I write a lot. Blog posts, docs, READMEs — probably
2,000 words a week. Last month I got lazy and let AI write three posts
for me. They looked fine. Professional, even. But something felt off.
So I ran them through a readability scorer. The numbers were bad.`;

console.log('AI:', readability(aiText));
console.log('Human:', readability(humanText));

分数

Lower grade level = easier to read. Higher Flesch score = more readable.

指标	AI‑Written (avg)	Human‑Written (avg)	获胜者
Flesch Reading Ease	‑4.7	73.8	Human
FK Grade Level	19.9	5.1	Human
Gunning Fog Index	24.9	7.5	Human

AI 文本的 Flesch 阅读易度 为负值，这意味着它比医学研究论文还难阅读。其 19.9 的年级水平相当于需要博士候选人才能轻松阅读博客文章的开头。相比之下，人工撰写的文本平均为 5 年级水平——任何青少年都能读懂。

为什么 AI 文本得分如此低

可读性公式衡量 句子长度 和 音节数。AI 往往使用冗长的复合句，充斥着多音节的行话。

AI 版本：

“情感分析算法的实现代表了自然语言处理与机器学习技术之间的迷人交叉点。”

Human 版本：

“情感分析听起来很复杂，但代码很简单。”

该 AI 句子的 Gunning Fog 指数为 28.4（研究生水平），而人类句子的指数为 7.5（七年级水平）。

AI 还喜欢使用填充词——leverage, innovative, comprehensive, unprecedented——这些词增加了音节，却没有增加意义。真正的开发者倾向于使用 use, new, full。

情感惊喜

AI 文本在情感分析中始终获得 更积极 的得分，而我的写作则徘徊在中性附近。

import { sentiment } from 'textlens';

const aiResult = sentiment(aiText);
// { score: 4, comparative: 0.074, positive: ['innovative', ...], ... }

const humanResult = sentiment(humanText);
// { score: -1, comparative: -0.034, positive: [], negative: ['lazy', 'bad'] }

AI 持续保持乐观，在文本中频繁使用诸如 exciting, powerful, exceptional, revolutionary 等词。我的散文则包含了诚实的词语，如 lazy 和 bad，这些词让读者感到更真实。

我现在实际做的事

我仍然使用 AI 来起草文本，但在工作流中加入了一个评分步骤：

import { analyze } from 'textlens';

function checkDraft(text) {
  const result = analyze(text);
  const { fleschReadingEase, fleschKincaidGrade } = result.readability;

  if (fleschKincaidGrade.score > 10) {
    console.warn(`⚠️ Grade level ${fleschKincaidGrade.score} — too complex`);
    console.warn('Simplify sentences and reduce jargon.');
  }

  if (fleschReadingEase.score < 50) {
    console.warn(`⚠️ Flesch score ${fleschReadingEase.score} — hard to read`);
  }

  console.log(`✅ Grade: ${fleschKincaidGrade.score} | Flesch: ${fleschReadingEase.score}`);
}

我的规则： 不允许任何内容超过八年级水平。如果 AI 给我一个十六年级的段落，我会重写，直到分数下降——通常只需 30 秒 的微调：缩短句子，换掉生僻词。

要点

AI 在生成大量内容方面表现出色，但在可读性上却常常捉襟见肘。结果是看起来很有气势的文本，却表现不佳——跳出率高、互动低，读者往往只浏览而后离开。

解决方案并不是完全不使用 AI，而是衡量你发布的内容。可读性并非主观感受，它是数学：句子长度、音节数、词频——这些数字都可以在发布前检查。

使用的工具： textlens — 零依赖的 Node.js 文本分析工具。使用 npm install textlens 安装，并通过 npx textlens "your text here" 试用。

你对 AI 生成内容的质量有什么经验？是已经进行过测量，还是仅凭肉眼判断？

我让 AI 写我的博客文章，然后为其质量打分——结果残酷

实验

分数

为什么 AI 文本得分如此低

情感惊喜

我现在实际做的事

要点

相关文章

notion-flows: 你的 Notion 页面就是你的自动化代码

为什么 AI 代理需要未列出号码

为什么您的Docking Station无法检测External Monitor：深入探讨Multi-Display Architecture

你的代理是一个小型、低风险的 HAL