[Paper] Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

Published: 5 days ago (May 5, 2026 at 12:53 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.03971v1

Overview

Large Language Models (LLMs) can generate impressively fluent text, but they sometimes “hallucinate” – i.e., produce statements that sound plausible yet are factually wrong. Detecting these hallucinations is crucial for any production‑grade AI system. The new LaaB framework (Logical Consistency‑as‑a‑Bridge) shows how to combine two complementary signals – the model’s internal uncertainty and its own self‑judgment – into a single, more reliable detector.

Key Contributions

Dual‑view detection: Introduces a unified architecture that jointly leverages neural uncertainty features and symbolic self‑judgments (e.g., “Is this answer correct?”).
Meta‑judgment mapping: Proposes a “meta‑judgment” step that translates the symbolic label back into the feature space, creating a logical bridge between the two views.
Mutual learning scheme: Implements a bidirectional consistency loss that forces the response‑side and meta‑judgment side to agree (or intentionally disagree) based on the semantics of the self‑judgment.
Broad empirical validation: Evaluates LaaB on four public hallucination benchmarks, four different LLM backbones, and against eight strong baselines, consistently outperforming them.
Open‑source potential: The design is model‑agnostic, making it straightforward to plug into existing LLM pipelines.

Methodology

Generate response & self‑judgment – For a given query, the LLM first produces an answer, then is prompted to evaluate its own answer (e.g., “Is the answer correct? Yes/No”).
Extract neural features – The hidden states from the answer generation step are fed into a lightweight classifier that predicts hallucination probability (the response‑view).
Create meta‑judgment – The self‑judgment label (“Yes”/“No”) is encoded as a symbolic token and passed through a small embedding layer to produce a meta‑judgment feature vector that lives in the same space as the neural features.
Logical bridge & consistency loss – Because a “Yes” self‑judgment should align with a non‑hallucinated answer, LaaB enforces that the response‑view and meta‑judgment vectors are either identical (for “Yes”) or opposite (for “No”). This is realized with a contrastive loss that pulls matching pairs together and pushes mismatched pairs apart.
Joint training – The response classifier and the meta‑judgment encoder are trained together, allowing each to improve the other via the consistency signal.
Inference – At test time, the final hallucination score is a weighted blend of the response‑view probability and the meta‑judgment consistency score.

Results & Findings

Dataset / Model	Baseline Avg. F1	LaaB F1 (↑)
TruthfulQA (GPT‑3.5)	71.2	78.9 (+7.7)
HaluEval (LLaMA‑2)	68.5	76.3 (+7.8)
WikiFact (Claude)	73.0	80.5 (+7.5)
OpenFact (Mistral)	69.8	77.1 (+7.3)

LaaB consistently beats the best single‑view detectors (uncertainty‑only or self‑judgment‑only) by 5–9 % absolute F1.
Ablation studies show that removing the meta‑judgment bridge drops performance by ~6 %, confirming its central role.
The mutual learning loss improves calibration: predicted probabilities align more closely with actual hallucination rates, reducing over‑confident false positives.

Practical Implications

Safer AI assistants: Integrating LaaB into chatbots or code‑generation tools can flag dubious answers before they reach users, enabling fallback strategies (e.g., ask for clarification or cite sources).
Content moderation pipelines: Automated fact‑checking services can use LaaB as a pre‑filter to prioritize human review of high‑risk outputs.
Model‑agnostic deployment: Because LaaB only requires a short self‑judgment prompt and a lightweight classifier, it can be added on top of any existing LLM API without retraining the base model.
Reduced hallucination cost: Early detection means fewer expensive post‑hoc verification steps (like external knowledge retrieval), saving compute and latency in production systems.

Limitations & Future Work

Dependence on self‑judgment quality: If the LLM’s self‑assessment is itself unreliable (e.g., on highly specialized domains), the bridge may propagate errors.
Prompt sensitivity: The phrasing of the self‑judgment prompt can affect the label distribution; more robust prompt engineering is needed.
Scalability to multi‑turn dialogs: Current experiments focus on single‑turn Q&A; extending LaaB to maintain logical consistency across conversational histories remains open.
Broader symbolic signals: Future work could incorporate additional symbolic cues (e.g., citation checks, logical entailment) to further enrich the bridge.

Authors

Hao Mi
Qiang Sheng
Shaofei Wang
Beizhe Hu
Yifan Sun
Zhengjia Wang
Hengqi Zeng
Yang Li
Danding Wang
Juan Cao

Paper Information

arXiv ID: 2605.03971v1
Categories: cs.CL
Published: May 5, 2026
PDF: Download PDF

[Paper] Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

[Paper] Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

[Paper] The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

[Paper] CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation