[Paper] Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval

Published: (March 5, 2026 at 01:42 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.05471v1

Overview

The paper tackles a practical pain point for AI‑powered agents: verifying the truth of a claim without having to fetch external documents first. By leaning on the parametric knowledge already baked into large language models (LLMs), the authors show that fact‑checking can become faster, more scalable, and less dependent on noisy retrieval pipelines.

Key Contributions

  • New task definition: Fact‑checking without any external retrieval, applicable to any natural‑language claim regardless of its source.
  • Comprehensive benchmark: An evaluation suite that stresses generalization across (i) long‑tail facts, (ii) diverse claim origins, (iii) multiple languages, and (iv) long‑form generated statements.
  • Empirical insight: Logit‑based confidence scores (e.g., “yes/no” probabilities) often lag behind methods that tap into the model’s hidden states.
  • INTRA method: A novel technique that measures interactions between internal token‑level representations, achieving state‑of‑the‑art performance across 9 datasets, 18 baselines, and 3 LLM families.
  • Broader vision: Demonstrates that retrieval‑free verification can complement existing pipelines, serve as a training reward signal, and be embedded directly into generation loops.

Methodology

  1. Task framing – Given a claim c, the model must output a factuality label (e.g., True/False/Uncertain) without consulting any external knowledge base.
  2. Evaluation framework – The authors curate a multi‑dimensional testbed:
    • Long‑tail: Rare facts that appear infrequently in pre‑training data.
    • Source variation: Claims extracted from news, scientific abstracts, social media, and LLM‑generated text.
    • Multilingual: English plus several other languages to probe cross‑lingual knowledge.
    • Long‑form: Paragraph‑level statements rather than isolated sentences.
  3. Baseline families
    • Logit‑based: Directly use the model’s output logits for a “yes/no” prompt.
    • Embedding‑based: Compare claim embeddings to stored fact embeddings.
    • Representation‑based: Probe hidden layers (e.g., attention maps, token‑wise vectors).
  4. INTRA design – Instead of a single scalar confidence, INTRA computes pairwise similarity between the claim’s token representations and the model’s internal “knowledge” vectors (learned during pre‑training). These interaction scores are aggregated with a lightweight classifier to produce the final verdict.

Results & Findings

Metric (average across datasets)Logit‑basedEmbedding‑basedRepresentation‑basedINTRA
Accuracy68.2 %71.5 %74.9 %80.3 %
F1 (True)66.7 %70.1 %73.4 %79.0 %
Robustness to long‑tail (Δ)–3.1 %–1.8 %–0.9 %+0.4 %
  • INTRA consistently outperformed all baselines, especially on low‑resource languages and long‑form claims.
  • Retrieval‑free models were 30‑40 % faster than retrieval‑augmented pipelines (no network I/O, no index look‑ups).
  • The gap between logit‑based and representation‑based methods grew larger as claim length increased, indicating that richer internal signals matter more for complex statements.

Practical Implications

  • Scalable verification services – SaaS platforms can embed INTRA directly into their API stack, offering instant fact‑checking without the latency of search engines.
  • On‑device AI – Edge devices (e.g., smartphones, IoT hubs) can run a compact LLM and still assess claim credibility offline, useful for privacy‑sensitive applications.
  • Training feedback loops – Since INTRA works without external data, it can serve as an automatic reward model for RL‑based fine‑tuning, encouraging LLMs to generate more truthful outputs.
  • Content moderation – Social media pipelines can flag potentially false statements in real time, even when the underlying claim references obscure or newly emerging facts.
  • Multilingual support – The benchmark shows that the approach generalizes across languages, opening doors for global fact‑checking tools without needing language‑specific retrieval corpora.

Limitations & Future Work

  • Knowledge cut‑off – The method is bounded by what the LLM has already memorized; truly novel facts (post‑training events) remain out of reach.
  • Model size dependency – Larger models provide richer internal representations, so performance may degrade on smaller, resource‑constrained LLMs.
  • Interpretability – While INTRA leverages hidden states, explaining why a claim is labeled false is still an open challenge.
  • Future directions suggested by the authors include: hybridizing retrieval‑free and retrieval‑augmented signals, extending the interaction mechanism to multimodal LLMs, and exploring continual‑learning strategies to update parametric knowledge without full re‑training.

Authors

  • Artem Vazhentsev
  • Maria Marina
  • Daniil Moskovskiy
  • Sergey Pletenev
  • Mikhail Seleznyov
  • Mikhail Salnikov
  • Elena Tutubalina
  • Vasily Konovalov
  • Irina Nikishina
  • Alexander Panchenko
  • Viktor Moskvoretskii

Paper Information

  • arXiv ID: 2603.05471v1
  • Categories: cs.CL, cs.AI
  • Published: March 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »