Building a Transparent AI Pipeline: 59 Weeks of Automated Political Scoring with Claude API

Published: (March 18, 2026 at 11:25 PM EDT)
5 min read
Source: Dev.to

Source: Dev.to

![Cover image for Building a Transparent AI Pipeline: 59 Weeks of Automated Political Scoring with Claude API](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi3e292ud2evff6smts3.png)

[![Steve Harlow](https://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3648455%2F134510f9-fcbe-4f4a-aa40-6d640fafb234.jpg)](https://dev.to/steve_harlow_0dbc0e910b6d)

I've been running an automated AI pipeline for over a year that ingests news articles, clusters them into political events, and scores each event on two independent axes. Here's how it works, what I learned, and why I made everything transparent.

---

## The Problem

Political events have two dimensions that are rarely measured together:

- **How much institutional damage does this cause?** (democratic health)  
- **How much media attention does it get?** (distraction economics)

When these are wildly mismatched — high damage, low attention — something important is being missed. I built [The Distraction Index](https://distractionindex.org/) to detect these gaps automatically.

---

## Architecture Overview

```text
News Sources (GDELT + GNews + Google News RSS)
    ↓ every 4 hours
Ingestion Pipeline (/api/ingest)
    ↓ dedup + store
Clustering (Claude Haiku) → group articles into events

Dual‑Axis Scoring (Claude Sonnet) → Score A + Score B

Weekly Freeze → immutable snapshot

Tech stack: Next.js 16 (App Router), Supabase (PostgreSQL), Claude API, Vercel


Why Two Models?

Cost optimization was critical. Running everything through Sonnet would cost ~$300 /month. Instead:

ModelRoleCost (per 1 M tokens)
Claude HaikuArticle clustering$0.25
Claude SonnetScoring (institutional impact)$3.00

Result: ~$30 /month for a production pipeline processing articles every 4 hours.


The Dual Scoring System

Score A: Constitutional Damage (0‑100)

Seven weighted governance drivers, each scored 0‑5:

DriverWeightWhat it measures
Judicial Independence0.18Court stacking, ruling defiance
Press Freedom0.15Journalist targeting, access restrictions
Voting Rights0.15Disenfranchisement, election interference
Environmental Policy0.12Regulatory rollbacks, enforcement gaps
Civil Liberties0.15Due process, privacy, free assembly
International Norms0.10Treaty violations, alliance damage
Fiscal Governance0.15Budget manipulation, oversight bypass

Each driver score is multiplied by severity modifiers (durability × reversibility × precedent) and mechanism/scope modifiers.

Score B: Distraction/Hype (0‑100)

Two‑layer model:

LayerWeightDescription
Layer 1 (55 %)Raw media hype – volume, social amplification, cross‑platform spread, emotional framing, celebrity involvement
Layer 2 (45 %)Strategic manipulation indicators – timing relative to damage events, coordinated messaging, deflection patterns

Layer 2 is modulated by an intentionality score (0‑15). Low intentionality drops Layer 2’s weight to 10 %.


Classification

Events are classified by dominance margin:

ClassCondition
Damage (List A)Score A exceeds Score B by ≥ 10 points
Distraction (List B)Score B exceeds Score A by ≥ 10 points
Noise (List C)Neither dominates

The Smokescreen Index

The most interesting feature: automatic pairing of high‑distraction events with concurrent high‑damage events. When a B‑dominant event (media spectacle) co‑occurs with an A‑dominant event (institutional harm) that received less coverage, the system flags it as a potential smokescreen.

  • 210+ pairs identified across 59 weeks.

Radical Transparency

Every scoring formula, weight, and AI prompt is published at /methodology. This was a deliberate design choice — if you’re scoring political events, your methodology must be auditable.

Key transparency features:

  • Immutable weekly snapshots – once a week freezes, scores cannot be silently changed.
  • Append‑only corrections – post‑freeze corrections are timestamped and linked to the original.
  • Published prompts – the exact Claude prompts used for scoring are documented.
  • Open sourcefull codebase on GitHub.

What I Learned

  1. Publishing your prompts is terrifying
    When your prompt templates are public, anyone can argue with your framing. That’s the point — but it requires thick skin and a willingness to iterate.

  2. Immutability prevents model drift
    Without frozen snapshots, you can’t tell if score changes come from real‑world changes or model updates. Immutability is essential for longitudinal analysis.

  3. The two‑axis approach reveals patterns
    Single‑dimension scoring (left/right, reliable/unreliable) misses the key insight: damage and distraction are independent variables. Some events are both; some are neither.

  4. Cost optimization matters for indie projects
    The Haiku‑for‑clustering, Sonnet‑for‑scoring split keeps costs at ~$30 /month. Without this, the project wouldn’t be sustainable as a solo effort.


The Numbers (after 59 weeks)

  • 1,500+ scored events
  • 11,800+ ingested articles
  • 210+ smokescreen pairs
  • 288 tests passing
  • 1,071 pages indexed

Try It

  • Live site:
  • Methodology:
  • Source code:

I’d love feedback on the scoring methodology!

It looks like the snippet you provided is just a fragment (“odology. What would you weight differently? What blind spots do you see?”). To clean up the markdown while preserving its structure and content, I’ll need the complete markdown segment you’d like revised.

Could you please paste the full markdown text (or at least the surrounding context) that you want cleaned up? Once I have the complete segment, I’ll tidy up the formatting, headings, lists, code blocks, etc., while keeping the original content unchanged.

0 views
Back to Blog

Related posts

Read more »