I Let AI Write My Blog Posts, Then Scored Them for Quality — The Results Were Brutal
Source: Dev.to
I write a lot—blog posts, docs, READMEs—probably 2,000 words a week. Last month I let AI generate a few blog paragraphs. They looked fine, professional even, but something felt off, so I ran them through a readability scorer. The numbers were brutal.
The Experiment
I took four AI‑generated blog paragraphs (the kind ChatGPT/Claude produce when you say “write me a blog intro about web development”) and four paragraphs I’d written myself. Then I scored all eight using textlens, an open‑source text analysis library.
Scoring code (JavaScript):
import { readability, sentiment } from 'textlens';
const aiText = `In today's rapidly evolving technological landscape,
developers are constantly seeking innovative solutions to streamline
their workflows and enhance productivity. The emergence of artificial
intelligence has fundamentally transformed the way we approach
software development, offering unprecedented opportunities for
automation and optimization.`;
const humanText = `I write a lot. Blog posts, docs, READMEs — probably
2,000 words a week. Last month I got lazy and let AI write three posts
for me. They looked fine. Professional, even. But something felt off.
So I ran them through a readability scorer. The numbers were bad.`;
console.log('AI:', readability(aiText));
console.log('Human:', readability(humanText));
The Scores
Lower grade level = easier to read. Higher Flesch score = more readable.
| Metric | AI‑Written (avg) | Human‑Written (avg) | Winner |
|---|---|---|---|
| Flesch Reading Ease | ‑4.7 | 73.8 | Human |
| FK Grade Level | 19.9 | 5.1 | Human |
| Gunning Fog Index | 24.9 | 7.5 | Human |
The AI text scored a negative Flesch Reading Ease, meaning it’s harder to read than a medical research paper. Its grade level of 19.9 would require a PhD candidate to comfortably read a blog‑post intro. By contrast, the human‑written text averaged a grade‑5 level—readable by any teenager.
Why AI Text Scores So Poorly
Readability formulas measure sentence length and syllable count. AI defaults to long, compound sentences packed with multi‑syllable jargon.
AI version:
“The implementation of sentiment analysis algorithms represents a fascinating intersection of natural language processing and machine learning technologies.”
Human version:
“Sentiment analysis sounds complex, but the code is simple.”
The AI sentence scores a Gunning Fog index of 28.4 (post‑graduate level) while the human sentence scores 7.5 (7th grade).
AI also loves filler words—leverage, innovative, comprehensive, unprecedented—which add syllables without adding meaning. Real developers tend to say use, new, full.
The Sentiment Surprise
AI text consistently scored more positive in sentiment analysis, whereas my writing hovered near neutral.
import { sentiment } from 'textlens';
const aiResult = sentiment(aiText);
// { score: 4, comparative: 0.074, positive: ['innovative', ...], ... }
const humanResult = sentiment(humanText);
// { score: -1, comparative: -0.034, positive: [], negative: ['lazy', 'bad'] }
AI is relentlessly upbeat, peppering text with words like exciting, powerful, exceptional, revolutionary. My prose included honest words like lazy and bad, which feel more authentic to readers.
What I Actually Do Now
I still use AI for drafts, but I added a scoring step to my workflow:
import { analyze } from 'textlens';
function checkDraft(text) {
const result = analyze(text);
const { fleschReadingEase, fleschKincaidGrade } = result.readability;
if (fleschKincaidGrade.score > 10) {
console.warn(`⚠️ Grade level ${fleschKincaidGrade.score} — too complex`);
console.warn('Simplify sentences and reduce jargon.');
}
if (fleschReadingEase.score < 50) {
console.warn(`⚠️ Flesch score ${fleschReadingEase.score} — hard to read`);
}
console.log(`✅ Grade: ${fleschKincaidGrade.score} | Flesch: ${fleschReadingEase.score}`);
}
My rule: nothing ships above grade 8. If AI gives me a grade‑16 paragraph, I rewrite it until the score drops—usually a 30‑second tweak: shorten sentences, swap fancy words.
The Takeaway
AI excels at generating volume but struggles with readability. The result is text that sounds impressive yet performs poorly—high bounce rates, low engagement, and readers who skim and leave.
The fix isn’t to avoid AI; it’s to measure what you publish. Readability isn’t subjective; it’s math: sentence length, syllable count, word frequency—numbers you can check before hitting publish.
Tool used: textlens — zero‑dependency text analysis for Node.js. Install with npm install textlens and try it via npx textlens "your text here".
What’s your experience with AI‑generated content quality? Have you measured it, or just eyeballed it?