I track 420 prediction sources with AI. Here's the open-source framework.
Source: Dev.to
The problem
There is no standard infrastructure for prediction tracking.
| Existing ecosystem | What it does |
|---|---|
| LangChain | Agents |
| HuggingFace | Models |
| Supabase | Back‑ends |
But nothing answers the question:
“Who predicted what, when did they say it, and were they right?”
Signal Tracker solves that.
Install
pip install signal-tracker
Zero dependencies – stdlib only.
Python 3.10+.
5‑minute walkthrough
Track sources and claims
from signal_tracker import SignalTracker
from datetime import date
tracker = SignalTracker()
# Add sources
elon = tracker.add_source("Elon Musk", source_type="person", category="tech")
cramer = tracker.add_source("Jim Cramer", source_type="person", category="finance")
imf = tracker.add_source("IMF", source_type="institution", category="economics")
# Add predictions
tracker.add_claim(
source=elon,
text="Tesla will achieve full self‑driving by end of 2025",
target_date=date(2025, 12, 31),
)
tracker.add_claim(
source=cramer,
text="Netflix will hit $800 by Q2 2025",
target_date=date(2025, 6, 30),
)
tracker.add_claim(
source=imf,
text="Global GDP growth will reach 3.2 % in 2025",
target_date=date(2025, 12, 31),
)
Verify outcomes
tracker.verify(claim1, outcome="wrong", reasoning="FSD not achieved by deadline")
tracker.verify(claim2, outcome="correct", reasoning="Netflix reached $820 in May")
tracker.verify(claim3, outcome="partial", reasoning="GDP grew 2.9 %, close but below target")
Build leaderboards
board = tracker.leaderboard(min_claims=3)
for entry in board.top_accurate:
print(f"{entry.rank}. {entry.source.name}: {entry.score.accuracy_score}%")
Other useful views:
board.worst_accurate– bottom performersboard.biggest_risers– improving fastboard.biggest_fallers– getting worseboard.notable_wrongs– high‑profile misses
The scoring system
Accuracy scoring
- Simple percentage‑based accuracy, with nuance:
- Partial correctness weighting – configurable (default 0.5, i.e., a partial hit counts as half)
- Minimum claim threshold – sources need at least 3 resolved claims for a meaningful score
- Time‑windowed scoring – accuracy for 30 d, 90 d, 12 mo, and all‑time separately
windows = tracker.accuracy_scorer.score_windowed(claims, source_id=source.id)
for period, snapshot in windows.items():
print(f" {period}: {snapshot.accuracy_score}%")
Recency‑weighted scoring
More recent predictions matter more. Uses exponential decay with a configurable half‑life:
from signal_tracker.scoring import AccuracyConfig
config = AccuracyConfig(recency_half_life_days=90)
tracker = SignalTracker(accuracy_config=config)
A prediction from last week has ~8× more influence than one from a year ago. This surfaces sources that were historically good but have recently fallen off.
Claim‑quality scoring
Not all predictions are created equal. “Things will get better eventually” is not the same as “Bitcoin will reach $150k by Q4 2025.”
The quality scorer rates each claim 0‑100 based on the table below.
| Factor | Weight | What it checks |
|---|---|---|
| Time‑bound | 30 % | Has a specific deadline? |
| Measurable | 30 % | Has numeric targets? |
| Falsifiable | 20 % | Clear success/failure criteria? |
| Recency | 20 % | How recent is the claim? |
from signal_tracker import QualityScorer
scorer = QualityScorer()
score = scorer.score(claim) # e.g., 87.5 — highly trackable
high_quality = [c for c in claims if scorer.is_high_quality(c)]
The scorer uses regex patterns to detect prediction language, dollar amounts, percentages, date references, and hedge words. Vague language (“might”, “could”, “eventually”) gets penalized.
Extracting predictions from text
Rule‑based (fast, no API calls)
text = """
In his latest interview, the CEO predicted that revenue would
exceed $10 billion by Q2 2025. He also forecast that the company
would reach 100 million users within 18 months.
"""
claims = tracker.extract_claims(text, source=ceo)
for claim in claims:
print(f" {claim.text}")
print(f" Target: {claim.target_date}")
print(f" Category: {claim.category}")
print(f" Quality: {claim.quality_score}")
LLM‑powered (more accurate)
Bring your own LLM function (any str → str callable works).
import anthropic
client = anthropic.Anthropic()
def my_llm(prompt: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text
tracker = SignalTracker(llm_fn=my_llm)
claims = tracker.extract_claims(transcript, source=analyst, use_llm=True)
The integration is model‑agnostic – OpenAI, Anthropic, Gemini, local models, etc.
Multi‑model consensus verification
In production at Crene we don’t trust a single model. We run several and take a weighted vote.
tracker.verify_with_consensus(
claim,
[
{"outcome": "correct", "verifier": "ai:claude", "confidence": 0.90},
{"outcome": "correct", "verifier": "ai:gpt-4", "confidence": 0.85},
{"outcome": "wrong", "verifier": "ai:gemini","confidence": 0.60},
],
)
Result: "correct" – weighted consensus wins.
Outcomes are weighted by confidence scores; a majority of high‑confidence agreements overrides a low‑confidence dissent.
Tamper detection
Every claim gets a SHA‑256 hash at creation time:
claim = tracker.add_claim(source, "Bitcoin to $200k by 2025")
print(claim.content_hash) # a1b2c3d4...
Later, verify nothing was changed:
claim.verify_integrity() # True
If someone modifies the text…
claim.text = "I never said that"
claim.verify_integrity() # False — hash mismatch
Persistence
JSON (simple)
tracker.save("my_tracker.json")
tracker = SignalTracker.load("my_tracker.json")
SQLite (for larger datasets)
from signal_tracker.storage import SQLiteBackend
backend = SQLiteBackend("tracker.db")
backend.save_source(source)
backend.save_claim(claim)
Query
all_claims = backend.list_claims(source_id="elon-musk")
Architecture
signal-tracker/
├── tracker.py # SignalTracker — main interface
├── models.py # Source, Claim, Verification, ScoreSnapshot
├── scoring.py # AccuracyScorer, QualityScorer
├── extractors.py # ClaimExtractor (rules + LLM)
├── leaderboard.py # Leaderboard engine
└── storage.py # SQLiteBackend
Design principles
- Zero required dependencies – stdlib only for core
- Bring your own LLM – any provider works
- Pluggable storage – JSON, SQLite, or build your own
- Plain dataclasses – no ORM dependency anywhere
What’s Next
The roadmap depends on what the community wants:
| Version | Feature |
|---|---|
| v0.2 | REST API server (FastAPI) |
| v0.3 | Auto‑ingest from RSS, Twitter, YouTube transcripts |
| v0.4 | Dashboard UI (React) |
| v0.5 | Prediction‑market integrations (Polymarket, Kalshi) |
| v0.6 | Blockchain anchoring for tamper‑proof records |
Try It
pip install signal-tracker
- GitHub: https://github.com/Creneinc/signal-tracker
- PyPI: https://pypi.org/project/signal-tracker
- Production version: https://crene.com (see the Signals tab)
40 tests passing. MIT licensed. Contributions welcome.
The framework is free. The data is the moat.