Building a Transparent AI Pipeline: 59 Weeks of Automated Political Scoring with Claude API
Source: Dev.to

[](https://dev.to/steve_harlow_0dbc0e910b6d)
I've been running an automated AI pipeline for over a year that ingests news articles, clusters them into political events, and scores each event on two independent axes. Here's how it works, what I learned, and why I made everything transparent.
---
## The Problem
Political events have two dimensions that are rarely measured together:
- **How much institutional damage does this cause?** (democratic health)
- **How much media attention does it get?** (distraction economics)
When these are wildly mismatched — high damage, low attention — something important is being missed. I built [The Distraction Index](https://distractionindex.org/) to detect these gaps automatically.
---
## Architecture Overview
```text
News Sources (GDELT + GNews + Google News RSS)
↓ every 4 hours
Ingestion Pipeline (/api/ingest)
↓ dedup + store
Clustering (Claude Haiku) → group articles into events
↓
Dual‑Axis Scoring (Claude Sonnet) → Score A + Score B
↓
Weekly Freeze → immutable snapshot
Tech stack: Next.js 16 (App Router), Supabase (PostgreSQL), Claude API, Vercel
Why Two Models?
Cost optimization was critical. Running everything through Sonnet would cost ~$300 /month. Instead:
| Model | Role | Cost (per 1 M tokens) |
|---|---|---|
| Claude Haiku | Article clustering | $0.25 |
| Claude Sonnet | Scoring (institutional impact) | $3.00 |
Result: ~$30 /month for a production pipeline processing articles every 4 hours.
The Dual Scoring System
Score A: Constitutional Damage (0‑100)
Seven weighted governance drivers, each scored 0‑5:
| Driver | Weight | What it measures |
|---|---|---|
| Judicial Independence | 0.18 | Court stacking, ruling defiance |
| Press Freedom | 0.15 | Journalist targeting, access restrictions |
| Voting Rights | 0.15 | Disenfranchisement, election interference |
| Environmental Policy | 0.12 | Regulatory rollbacks, enforcement gaps |
| Civil Liberties | 0.15 | Due process, privacy, free assembly |
| International Norms | 0.10 | Treaty violations, alliance damage |
| Fiscal Governance | 0.15 | Budget manipulation, oversight bypass |
Each driver score is multiplied by severity modifiers (durability × reversibility × precedent) and mechanism/scope modifiers.
Score B: Distraction/Hype (0‑100)
Two‑layer model:
| Layer | Weight | Description |
|---|---|---|
| Layer 1 (55 %) | Raw media hype – volume, social amplification, cross‑platform spread, emotional framing, celebrity involvement | |
| Layer 2 (45 %) | Strategic manipulation indicators – timing relative to damage events, coordinated messaging, deflection patterns |
Layer 2 is modulated by an intentionality score (0‑15). Low intentionality drops Layer 2’s weight to 10 %.
Classification
Events are classified by dominance margin:
| Class | Condition |
|---|---|
| Damage (List A) | Score A exceeds Score B by ≥ 10 points |
| Distraction (List B) | Score B exceeds Score A by ≥ 10 points |
| Noise (List C) | Neither dominates |
The Smokescreen Index
The most interesting feature: automatic pairing of high‑distraction events with concurrent high‑damage events. When a B‑dominant event (media spectacle) co‑occurs with an A‑dominant event (institutional harm) that received less coverage, the system flags it as a potential smokescreen.
- 210+ pairs identified across 59 weeks.
Radical Transparency
Every scoring formula, weight, and AI prompt is published at /methodology. This was a deliberate design choice — if you’re scoring political events, your methodology must be auditable.
Key transparency features:
- Immutable weekly snapshots – once a week freezes, scores cannot be silently changed.
- Append‑only corrections – post‑freeze corrections are timestamped and linked to the original.
- Published prompts – the exact Claude prompts used for scoring are documented.
- Open source – full codebase on GitHub.
What I Learned
-
Publishing your prompts is terrifying
When your prompt templates are public, anyone can argue with your framing. That’s the point — but it requires thick skin and a willingness to iterate. -
Immutability prevents model drift
Without frozen snapshots, you can’t tell if score changes come from real‑world changes or model updates. Immutability is essential for longitudinal analysis. -
The two‑axis approach reveals patterns
Single‑dimension scoring (left/right, reliable/unreliable) misses the key insight: damage and distraction are independent variables. Some events are both; some are neither. -
Cost optimization matters for indie projects
The Haiku‑for‑clustering, Sonnet‑for‑scoring split keeps costs at ~$30 /month. Without this, the project wouldn’t be sustainable as a solo effort.
The Numbers (after 59 weeks)
- 1,500+ scored events
- 11,800+ ingested articles
- 210+ smokescreen pairs
- 288 tests passing
- 1,071 pages indexed
Try It
- Live site:
- Methodology:
- Source code:
I’d love feedback on the scoring methodology!
It looks like the snippet you provided is just a fragment (“odology. What would you weight differently? What blind spots do you see?”). To clean up the markdown while preserving its structure and content, I’ll need the complete markdown segment you’d like revised.
Could you please paste the full markdown text (or at least the surrounding context) that you want cleaned up? Once I have the complete segment, I’ll tidy up the formatting, headings, lists, code blocks, etc., while keeping the original content unchanged.