I Let AI Write My Blog Posts, Then Scored Them for Quality — The Results Were Brutal

Published: 2 days ago (March 7, 2026 at 11:32 PM EST)

4 min read

Source: Dev.to

I write a lot—blog posts, docs, READMEs—probably 2,000 words a week. Last month I let AI generate a few blog paragraphs. They looked fine, professional even, but something felt off, so I ran them through a readability scorer. The numbers were brutal.

The Experiment

I took four AI‑generated blog paragraphs (the kind ChatGPT/Claude produce when you say “write me a blog intro about web development”) and four paragraphs I’d written myself. Then I scored all eight using textlens, an open‑source text analysis library.

Scoring code (JavaScript):

import { readability, sentiment } from 'textlens';

const aiText = `In today's rapidly evolving technological landscape,
developers are constantly seeking innovative solutions to streamline
their workflows and enhance productivity. The emergence of artificial
intelligence has fundamentally transformed the way we approach
software development, offering unprecedented opportunities for
automation and optimization.`;

const humanText = `I write a lot. Blog posts, docs, READMEs — probably
2,000 words a week. Last month I got lazy and let AI write three posts
for me. They looked fine. Professional, even. But something felt off.
So I ran them through a readability scorer. The numbers were bad.`;

console.log('AI:', readability(aiText));
console.log('Human:', readability(humanText));

The Scores

Lower grade level = easier to read. Higher Flesch score = more readable.

Metric	AI‑Written (avg)	Human‑Written (avg)	Winner
Flesch Reading Ease	‑4.7	73.8	Human
FK Grade Level	19.9	5.1	Human
Gunning Fog Index	24.9	7.5	Human

The AI text scored a negative Flesch Reading Ease, meaning it’s harder to read than a medical research paper. Its grade level of 19.9 would require a PhD candidate to comfortably read a blog‑post intro. By contrast, the human‑written text averaged a grade‑5 level—readable by any teenager.

Why AI Text Scores So Poorly

Readability formulas measure sentence length and syllable count. AI defaults to long, compound sentences packed with multi‑syllable jargon.

AI version:

“The implementation of sentiment analysis algorithms represents a fascinating intersection of natural language processing and machine learning technologies.”

Human version:

“Sentiment analysis sounds complex, but the code is simple.”

The AI sentence scores a Gunning Fog index of 28.4 (post‑graduate level) while the human sentence scores 7.5 (7th grade).

AI also loves filler words—leverage, innovative, comprehensive, unprecedented—which add syllables without adding meaning. Real developers tend to say use, new, full.

The Sentiment Surprise

AI text consistently scored more positive in sentiment analysis, whereas my writing hovered near neutral.

import { sentiment } from 'textlens';

const aiResult = sentiment(aiText);
// { score: 4, comparative: 0.074, positive: ['innovative', ...], ... }

const humanResult = sentiment(humanText);
// { score: -1, comparative: -0.034, positive: [], negative: ['lazy', 'bad'] }

AI is relentlessly upbeat, peppering text with words like exciting, powerful, exceptional, revolutionary. My prose included honest words like lazy and bad, which feel more authentic to readers.

What I Actually Do Now

I still use AI for drafts, but I added a scoring step to my workflow:

import { analyze } from 'textlens';

function checkDraft(text) {
  const result = analyze(text);
  const { fleschReadingEase, fleschKincaidGrade } = result.readability;

  if (fleschKincaidGrade.score > 10) {
    console.warn(`⚠️ Grade level ${fleschKincaidGrade.score} — too complex`);
    console.warn('Simplify sentences and reduce jargon.');
  }

  if (fleschReadingEase.score < 50) {
    console.warn(`⚠️ Flesch score ${fleschReadingEase.score} — hard to read`);
  }

  console.log(`✅ Grade: ${fleschKincaidGrade.score} | Flesch: ${fleschReadingEase.score}`);
}

My rule: nothing ships above grade 8. If AI gives me a grade‑16 paragraph, I rewrite it until the score drops—usually a 30‑second tweak: shorten sentences, swap fancy words.

The Takeaway

AI excels at generating volume but struggles with readability. The result is text that sounds impressive yet performs poorly—high bounce rates, low engagement, and readers who skim and leave.

The fix isn’t to avoid AI; it’s to measure what you publish. Readability isn’t subjective; it’s math: sentence length, syllable count, word frequency—numbers you can check before hitting publish.

Tool used: textlens — zero‑dependency text analysis for Node.js. Install with npm install textlens and try it via npx textlens "your text here".

What’s your experience with AI‑generated content quality? Have you measured it, or just eyeballed it?

I Let AI Write My Blog Posts, Then Scored Them for Quality — The Results Were Brutal

The Experiment

The Scores

Why AI Text Scores So Poorly

The Sentiment Surprise

What I Actually Do Now

The Takeaway

Related posts

Understanding Grafana: A Comprehensive Guide for Beginners

The Business Case for Chaos Engineering: An ROI Calculator for Testing Application Reliability

How We Built a Chat AI Agent Into Live Device Testing Sessions

Your AI agent is a ticking time bomb. Here's how to defuse it.