I track 420 prediction sources with AI. Here's the open-source framework.

Published: (February 7, 2026 at 11:51 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

The problem

There is no standard infrastructure for prediction tracking.

Existing ecosystemWhat it does
LangChainAgents
HuggingFaceModels
SupabaseBack‑ends

But nothing answers the question:

“Who predicted what, when did they say it, and were they right?”

Signal Tracker solves that.

Install

pip install signal-tracker

Zero dependencies – stdlib only.
Python 3.10+.

5‑minute walkthrough

Track sources and claims

from signal_tracker import SignalTracker
from datetime import date

tracker = SignalTracker()

# Add sources
elon   = tracker.add_source("Elon Musk", source_type="person", category="tech")
cramer = tracker.add_source("Jim Cramer", source_type="person", category="finance")
imf    = tracker.add_source("IMF", source_type="institution", category="economics")

# Add predictions
tracker.add_claim(
    source=elon,
    text="Tesla will achieve full self‑driving by end of 2025",
    target_date=date(2025, 12, 31),
)

tracker.add_claim(
    source=cramer,
    text="Netflix will hit $800 by Q2 2025",
    target_date=date(2025, 6, 30),
)

tracker.add_claim(
    source=imf,
    text="Global GDP growth will reach 3.2 % in 2025",
    target_date=date(2025, 12, 31),
)

Verify outcomes

tracker.verify(claim1, outcome="wrong",   reasoning="FSD not achieved by deadline")
tracker.verify(claim2, outcome="correct", reasoning="Netflix reached $820 in May")
tracker.verify(claim3, outcome="partial", reasoning="GDP grew 2.9 %, close but below target")

Build leaderboards

board = tracker.leaderboard(min_claims=3)

for entry in board.top_accurate:
    print(f"{entry.rank}. {entry.source.name}: {entry.score.accuracy_score}%")

Other useful views:

  • board.worst_accurate – bottom performers
  • board.biggest_risers – improving fast
  • board.biggest_fallers – getting worse
  • board.notable_wrongs – high‑profile misses

The scoring system

Accuracy scoring

  • Simple percentage‑based accuracy, with nuance:
    • Partial correctness weighting – configurable (default 0.5, i.e., a partial hit counts as half)
    • Minimum claim threshold – sources need at least 3 resolved claims for a meaningful score
    • Time‑windowed scoring – accuracy for 30 d, 90 d, 12 mo, and all‑time separately
windows = tracker.accuracy_scorer.score_windowed(claims, source_id=source.id)

for period, snapshot in windows.items():
    print(f"  {period}: {snapshot.accuracy_score}%")

Recency‑weighted scoring

More recent predictions matter more. Uses exponential decay with a configurable half‑life:

from signal_tracker.scoring import AccuracyConfig

config = AccuracyConfig(recency_half_life_days=90)
tracker = SignalTracker(accuracy_config=config)

A prediction from last week has ~8× more influence than one from a year ago. This surfaces sources that were historically good but have recently fallen off.

Claim‑quality scoring

Not all predictions are created equal. “Things will get better eventually” is not the same as “Bitcoin will reach $150k by Q4 2025.”

The quality scorer rates each claim 0‑100 based on the table below.

FactorWeightWhat it checks
Time‑bound30 %Has a specific deadline?
Measurable30 %Has numeric targets?
Falsifiable20 %Clear success/failure criteria?
Recency20 %How recent is the claim?
from signal_tracker import QualityScorer

scorer = QualityScorer()
score = scorer.score(claim)          # e.g., 87.5 — highly trackable
high_quality = [c for c in claims if scorer.is_high_quality(c)]

The scorer uses regex patterns to detect prediction language, dollar amounts, percentages, date references, and hedge words. Vague language (“might”, “could”, “eventually”) gets penalized.

Extracting predictions from text

Rule‑based (fast, no API calls)

text = """
In his latest interview, the CEO predicted that revenue would 
exceed $10 billion by Q2 2025. He also forecast that the company 
would reach 100 million users within 18 months.
"""

claims = tracker.extract_claims(text, source=ceo)

for claim in claims:
    print(f"  {claim.text}")
    print(f"  Target: {claim.target_date}")
    print(f"  Category: {claim.category}")
    print(f"  Quality: {claim.quality_score}")

LLM‑powered (more accurate)

Bring your own LLM function (any str → str callable works).

import anthropic

client = anthropic.Anthropic()

def my_llm(prompt: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

tracker = SignalTracker(llm_fn=my_llm)

claims = tracker.extract_claims(transcript, source=analyst, use_llm=True)

The integration is model‑agnostic – OpenAI, Anthropic, Gemini, local models, etc.

Multi‑model consensus verification

In production at Crene we don’t trust a single model. We run several and take a weighted vote.

tracker.verify_with_consensus(
    claim,
    [
        {"outcome": "correct", "verifier": "ai:claude",  "confidence": 0.90},
        {"outcome": "correct", "verifier": "ai:gpt-4",  "confidence": 0.85},
        {"outcome": "wrong",   "verifier": "ai:gemini","confidence": 0.60},
    ],
)

Result: "correct" – weighted consensus wins.
Outcomes are weighted by confidence scores; a majority of high‑confidence agreements overrides a low‑confidence dissent.

Tamper detection

Every claim gets a SHA‑256 hash at creation time:

claim = tracker.add_claim(source, "Bitcoin to $200k by 2025")
print(claim.content_hash)  # a1b2c3d4...

Later, verify nothing was changed:

claim.verify_integrity()  # True

If someone modifies the text…

claim.text = "I never said that"
claim.verify_integrity()  # False — hash mismatch

Persistence

JSON (simple)

tracker.save("my_tracker.json")
tracker = SignalTracker.load("my_tracker.json")

SQLite (for larger datasets)

from signal_tracker.storage import SQLiteBackend

backend = SQLiteBackend("tracker.db")
backend.save_source(source)
backend.save_claim(claim)

Query

all_claims = backend.list_claims(source_id="elon-musk")

Architecture

signal-tracker/
├── tracker.py        # SignalTracker — main interface
├── models.py         # Source, Claim, Verification, ScoreSnapshot
├── scoring.py        # AccuracyScorer, QualityScorer
├── extractors.py     # ClaimExtractor (rules + LLM)
├── leaderboard.py   # Leaderboard engine
└── storage.py        # SQLiteBackend

Design principles

  • Zero required dependencies – stdlib only for core
  • Bring your own LLM – any provider works
  • Pluggable storage – JSON, SQLite, or build your own
  • Plain dataclasses – no ORM dependency anywhere

What’s Next

The roadmap depends on what the community wants:

VersionFeature
v0.2REST API server (FastAPI)
v0.3Auto‑ingest from RSS, Twitter, YouTube transcripts
v0.4Dashboard UI (React)
v0.5Prediction‑market integrations (Polymarket, Kalshi)
v0.6Blockchain anchoring for tamper‑proof records

Try It

pip install signal-tracker

40 tests passing. MIT licensed. Contributions welcome.

The framework is free. The data is the moat.

0 views
Back to Blog

Related posts

Read more »

Happy women in STEM day!! <3

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...