[Paper] Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval

Published: 13 hours ago (March 5, 2026 at 01:42 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.05471v1

Overview

The paper tackles a practical pain point for AI‑powered agents: verifying the truth of a claim without having to fetch external documents first. By leaning on the parametric knowledge already baked into large language models (LLMs), the authors show that fact‑checking can become faster, more scalable, and less dependent on noisy retrieval pipelines.

Key Contributions

New task definition: Fact‑checking without any external retrieval, applicable to any natural‑language claim regardless of its source.
Comprehensive benchmark: An evaluation suite that stresses generalization across (i) long‑tail facts, (ii) diverse claim origins, (iii) multiple languages, and (iv) long‑form generated statements.
Empirical insight: Logit‑based confidence scores (e.g., “yes/no” probabilities) often lag behind methods that tap into the model’s hidden states.
INTRA method: A novel technique that measures interactions between internal token‑level representations, achieving state‑of‑the‑art performance across 9 datasets, 18 baselines, and 3 LLM families.
Broader vision: Demonstrates that retrieval‑free verification can complement existing pipelines, serve as a training reward signal, and be embedded directly into generation loops.

Methodology

Task framing – Given a claim c, the model must output a factuality label (e.g., True/False/Uncertain) without consulting any external knowledge base.
Evaluation framework – The authors curate a multi‑dimensional testbed:
- Long‑tail: Rare facts that appear infrequently in pre‑training data.
- Source variation: Claims extracted from news, scientific abstracts, social media, and LLM‑generated text.
- Multilingual: English plus several other languages to probe cross‑lingual knowledge.
- Long‑form: Paragraph‑level statements rather than isolated sentences.
Baseline families –
- Logit‑based: Directly use the model’s output logits for a “yes/no” prompt.
- Embedding‑based: Compare claim embeddings to stored fact embeddings.
- Representation‑based: Probe hidden layers (e.g., attention maps, token‑wise vectors).
INTRA design – Instead of a single scalar confidence, INTRA computes pairwise similarity between the claim’s token representations and the model’s internal “knowledge” vectors (learned during pre‑training). These interaction scores are aggregated with a lightweight classifier to produce the final verdict.

Results & Findings

Metric (average across datasets)	Logit‑based	Embedding‑based	Representation‑based	INTRA
Accuracy	68.2 %	71.5 %	74.9 %	80.3 %
F1 (True)	66.7 %	70.1 %	73.4 %	79.0 %
Robustness to long‑tail (Δ)	–3.1 %	–1.8 %	–0.9 %	+0.4 %

INTRA consistently outperformed all baselines, especially on low‑resource languages and long‑form claims.
Retrieval‑free models were 30‑40 % faster than retrieval‑augmented pipelines (no network I/O, no index look‑ups).
The gap between logit‑based and representation‑based methods grew larger as claim length increased, indicating that richer internal signals matter more for complex statements.

Practical Implications

Scalable verification services – SaaS platforms can embed INTRA directly into their API stack, offering instant fact‑checking without the latency of search engines.
On‑device AI – Edge devices (e.g., smartphones, IoT hubs) can run a compact LLM and still assess claim credibility offline, useful for privacy‑sensitive applications.
Training feedback loops – Since INTRA works without external data, it can serve as an automatic reward model for RL‑based fine‑tuning, encouraging LLMs to generate more truthful outputs.
Content moderation – Social media pipelines can flag potentially false statements in real time, even when the underlying claim references obscure or newly emerging facts.
Multilingual support – The benchmark shows that the approach generalizes across languages, opening doors for global fact‑checking tools without needing language‑specific retrieval corpora.

Limitations & Future Work

Knowledge cut‑off – The method is bounded by what the LLM has already memorized; truly novel facts (post‑training events) remain out of reach.
Model size dependency – Larger models provide richer internal representations, so performance may degrade on smaller, resource‑constrained LLMs.
Interpretability – While INTRA leverages hidden states, explaining why a claim is labeled false is still an open challenge.
Future directions suggested by the authors include: hybridizing retrieval‑free and retrieval‑augmented signals, extending the interaction mechanism to multimodal LLMs, and exploring continual‑learning strategies to update parametric knowledge without full re‑training.

Authors

Artem Vazhentsev
Maria Marina
Daniil Moskovskiy
Sergey Pletenev
Mikhail Seleznyov
Mikhail Salnikov
Elena Tutubalina
Vasily Konovalov
Irina Nikishina
Alexander Panchenko
Viktor Moskvoretskii

Paper Information

arXiv ID: 2603.05471v1
Categories: cs.CL, cs.AI
Published: March 5, 2026
PDF: Download PDF

[Paper] Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

[Paper] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[Paper] Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

[Paper] Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought