I track 420 prediction sources with AI. Here's the open-source framework.

Published: 3 days ago (February 7, 2026 at 11:51 PM EST)

5 min read

Source: Dev.to

The problem

There is no standard infrastructure for prediction tracking.

Existing ecosystem	What it does
LangChain	Agents
HuggingFace	Models
Supabase	Back‑ends

But nothing answers the question:

“Who predicted what, when did they say it, and were they right?”

Signal Tracker solves that.

Install

pip install signal-tracker

Zero dependencies – stdlib only.
Python 3.10+.

5‑minute walkthrough

Track sources and claims

from signal_tracker import SignalTracker
from datetime import date

tracker = SignalTracker()

# Add sources
elon   = tracker.add_source("Elon Musk", source_type="person", category="tech")
cramer = tracker.add_source("Jim Cramer", source_type="person", category="finance")
imf    = tracker.add_source("IMF", source_type="institution", category="economics")

# Add predictions
tracker.add_claim(
    source=elon,
    text="Tesla will achieve full self‑driving by end of 2025",
    target_date=date(2025, 12, 31),
)

tracker.add_claim(
    source=cramer,
    text="Netflix will hit $800 by Q2 2025",
    target_date=date(2025, 6, 30),
)

tracker.add_claim(
    source=imf,
    text="Global GDP growth will reach 3.2 % in 2025",
    target_date=date(2025, 12, 31),
)

Verify outcomes

tracker.verify(claim1, outcome="wrong",   reasoning="FSD not achieved by deadline")
tracker.verify(claim2, outcome="correct", reasoning="Netflix reached $820 in May")
tracker.verify(claim3, outcome="partial", reasoning="GDP grew 2.9 %, close but below target")

Build leaderboards

board = tracker.leaderboard(min_claims=3)

for entry in board.top_accurate:
    print(f"{entry.rank}. {entry.source.name}: {entry.score.accuracy_score}%")

Other useful views:

board.worst_accurate – bottom performers
board.biggest_risers – improving fast
board.biggest_fallers – getting worse
board.notable_wrongs – high‑profile misses

The scoring system

Accuracy scoring

Simple percentage‑based accuracy, with nuance:
- Partial correctness weighting – configurable (default 0.5, i.e., a partial hit counts as half)
- Minimum claim threshold – sources need at least 3 resolved claims for a meaningful score
- Time‑windowed scoring – accuracy for 30 d, 90 d, 12 mo, and all‑time separately

windows = tracker.accuracy_scorer.score_windowed(claims, source_id=source.id)

for period, snapshot in windows.items():
    print(f"  {period}: {snapshot.accuracy_score}%")

Recency‑weighted scoring

More recent predictions matter more. Uses exponential decay with a configurable half‑life:

from signal_tracker.scoring import AccuracyConfig

config = AccuracyConfig(recency_half_life_days=90)
tracker = SignalTracker(accuracy_config=config)

A prediction from last week has ~8× more influence than one from a year ago. This surfaces sources that were historically good but have recently fallen off.

Claim‑quality scoring

Not all predictions are created equal. “Things will get better eventually” is not the same as “Bitcoin will reach $150k by Q4 2025.”

The quality scorer rates each claim 0‑100 based on the table below.

Factor	Weight	What it checks
Time‑bound	30 %	Has a specific deadline?
Measurable	30 %	Has numeric targets?
Falsifiable	20 %	Clear success/failure criteria?
Recency	20 %	How recent is the claim?

from signal_tracker import QualityScorer

scorer = QualityScorer()
score = scorer.score(claim)          # e.g., 87.5 — highly trackable
high_quality = [c for c in claims if scorer.is_high_quality(c)]

The scorer uses regex patterns to detect prediction language, dollar amounts, percentages, date references, and hedge words. Vague language (“might”, “could”, “eventually”) gets penalized.

Extracting predictions from text

Rule‑based (fast, no API calls)

text = """
In his latest interview, the CEO predicted that revenue would 
exceed $10 billion by Q2 2025. He also forecast that the company 
would reach 100 million users within 18 months.
"""

claims = tracker.extract_claims(text, source=ceo)

for claim in claims:
    print(f"  {claim.text}")
    print(f"  Target: {claim.target_date}")
    print(f"  Category: {claim.category}")
    print(f"  Quality: {claim.quality_score}")

LLM‑powered (more accurate)

Bring your own LLM function (any str → str callable works).

import anthropic

client = anthropic.Anthropic()

def my_llm(prompt: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.content[0].text

tracker = SignalTracker(llm_fn=my_llm)

claims = tracker.extract_claims(transcript, source=analyst, use_llm=True)

The integration is model‑agnostic – OpenAI, Anthropic, Gemini, local models, etc.

Multi‑model consensus verification

In production at Crene we don’t trust a single model. We run several and take a weighted vote.

tracker.verify_with_consensus(
    claim,
    [
        {"outcome": "correct", "verifier": "ai:claude",  "confidence": 0.90},
        {"outcome": "correct", "verifier": "ai:gpt-4",  "confidence": 0.85},
        {"outcome": "wrong",   "verifier": "ai:gemini","confidence": 0.60},
    ],
)

Result: "correct" – weighted consensus wins.
Outcomes are weighted by confidence scores; a majority of high‑confidence agreements overrides a low‑confidence dissent.

Tamper detection

Every claim gets a SHA‑256 hash at creation time:

claim = tracker.add_claim(source, "Bitcoin to $200k by 2025")
print(claim.content_hash)  # a1b2c3d4...

Later, verify nothing was changed:

claim.verify_integrity()  # True

If someone modifies the text…

claim.text = "I never said that"
claim.verify_integrity()  # False — hash mismatch

Persistence

JSON (simple)

tracker.save("my_tracker.json")
tracker = SignalTracker.load("my_tracker.json")

SQLite (for larger datasets)

from signal_tracker.storage import SQLiteBackend

backend = SQLiteBackend("tracker.db")
backend.save_source(source)
backend.save_claim(claim)

Query

all_claims = backend.list_claims(source_id="elon-musk")

Architecture

signal-tracker/
├── tracker.py        # SignalTracker — main interface
├── models.py         # Source, Claim, Verification, ScoreSnapshot
├── scoring.py        # AccuracyScorer, QualityScorer
├── extractors.py     # ClaimExtractor (rules + LLM)
├── leaderboard.py   # Leaderboard engine
└── storage.py        # SQLiteBackend

Design principles

Zero required dependencies – stdlib only for core
Bring your own LLM – any provider works
Pluggable storage – JSON, SQLite, or build your own
Plain dataclasses – no ORM dependency anywhere

What’s Next

The roadmap depends on what the community wants:

Version	Feature
v0.2	REST API server (FastAPI)
v0.3	Auto‑ingest from RSS, Twitter, YouTube transcripts
v0.4	Dashboard UI (React)
v0.5	Prediction‑market integrations (Polymarket, Kalshi)
v0.6	Blockchain anchoring for tamper‑proof records

Try It

pip install signal-tracker

GitHub: https://github.com/Creneinc/signal-tracker
PyPI: https://pypi.org/project/signal-tracker
Production version: https://crene.com (see the Signals tab)

40 tests passing. MIT licensed. Contributions welcome.

The framework is free. The data is the moat.