[Paper] Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction

Published: 4 days ago (May 6, 2026 at 01:07 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.05134v1

Overview

Large Language Models (LLMs) are great at producing fluent text, but they often slip into “hallucinations” – statements that sound plausible yet are factually wrong. The paper Low‑Cost Black‑Box Detection of LLM Hallucinations via Dynamical System Prediction introduces a novel way to spot these errors without the heavy compute or external knowledge bases that most existing detectors need. By treating an LLM as a black‑box dynamical system and applying concepts from Koopman operator theory, the authors achieve state‑of‑the‑art detection with a single forward pass.

Key Contributions

Black‑box dynamical‑system view: Re‑frames LLM output sequences as trajectories in a high‑dimensional latent state space, sidestepping the need to peek inside the model.
Koopman‑based transition modeling: Learns linear operators that approximate the evolution of factual vs. hallucinated response trajectories, enabling a cheap prediction‑error score.
Differential residual score: Computes the mismatch between observed token embeddings and the two regime‑specific Koopman predictions, yielding a robust hallucination indicator.
Preference‑aware calibration: Introduces a lightweight, demonstration‑driven threshold‑tuning step that lets users bias the detector toward higher precision or recall depending on domain risk.
Empirical validation: Shows competitive or superior performance on three benchmark datasets while cutting inference cost by up to 70 % compared to sampling‑based detectors.

Methodology

Embedding the response: Each token (or sub‑sentence) generated by the LLM is passed through a separate, fixed embedding model (e.g., a sentence‑transformer) to obtain a high‑dimensional vector.
Trajectory construction: The sequence of vectors forms a time‑ordered trajectory ({x_t}) that is treated as an observable output of an underlying hidden state system.
Koopman operator fitting: Using a modest set of labeled examples (factual vs. hallucinated), the authors fit two linear operators (K_{\text{fact}}) and (K_{\text{hall}}) that best predict the next embedding:
[ \hat{x}_{t+1}=K,x_t ]
Separate operators capture the distinct dynamics of truthful and untruthful generation regimes.
Residual scoring: For a new LLM response, the method computes the prediction error under each operator:
[ r_{\text{fact}} = |x_{t+1} - K_{\text{fact}}x_t|,\quad r_{\text{hall}} = |x_{t+1} - K_{\text{hall}}x_t| ]
The differential residual (s = r_{\text{hall}} - r_{\text{fact}}) serves as the hallucination score—positive values indicate a higher likelihood of hallucination.
Calibration layer: A small validation set (e.g., 50–100 examples) is used to pick a decision threshold that respects a user‑specified trade‑off (e.g., prioritize precision for medical advice). This step is inexpensive and can be re‑run when domain requirements shift.

Results & Findings

Benchmark	Metric (F1)	Baseline (sampling)	Proposed method
FactBench (news)	0.84	0.78	0.86
MedHall (clinical notes)	0.79	0.71	0.81
CodeHall (programming Q&A)	0.82	0.75	0.84

Resource savings: Average inference time per query dropped from ~120 ms (5‑sample consistency check) to ~35 ms, a ~70 % reduction.
Robustness to model size: The detector works across LLMs ranging from 7 B to 175 B parameters with only minor performance variance.
Calibration impact: Adjusting the threshold for high‑precision mode raised precision from 0.78 to 0.92 while only modestly lowering recall (0.68 → 0.62), demonstrating practical control over risk tolerance.

Practical Implications

Plug‑and‑play safety layer: Because the method only needs the LLM’s output and a separate embedding model, it can be wrapped around any existing API (OpenAI, Anthropic, etc.) without retraining the LLM.
Low‑cost monitoring for production: SaaS platforms that serve millions of queries can add hallucination detection with negligible additional GPU load, preserving latency budgets.
Domain‑specific risk management: The calibration step lets teams in regulated fields (healthcare, finance, legal) set stricter thresholds, aligning detection behavior with compliance requirements.
Developer tooling: IDE extensions or CI pipelines that automatically flag potentially hallucinated code snippets or documentation can integrate this detector to improve code‑review quality.
Open‑source friendliness: The approach relies on publicly available embedding models and simple linear algebra, making it easy to reproduce and extend in community projects.

Limitations & Future Work

Embedding dependence: The quality of the detection hinges on the chosen embedding model; poor semantic representations could blur the distinction between factual and hallucinated dynamics.
Limited to observable trajectories: Extremely short responses (e.g., single‑word answers) provide insufficient temporal data for reliable Koopman fitting, reducing effectiveness in those cases.
Calibration data requirement: While modest, the need for a labeled demonstration set means the detector must be re‑calibrated when moving to a new domain with different hallucination patterns.
Future directions: The authors suggest exploring nonlinear Koopman extensions (e.g., kernel‑based operators) to capture richer dynamics, and integrating lightweight retrieval signals to further boost detection on edge‑case factual queries.

Authors

Dan Wilson
Mohamed Akrout

Paper Information

arXiv ID: 2605.05134v1
Categories: cs.LG, math.DS
Published: May 6, 2026
PDF: Download PDF

[Paper] Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction