[Paper] Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs

Published: 3 weeks ago (January 15, 2026 at 01:05 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.10645v1

Overview

Large language models (LLMs) are increasingly asked to state how confident they are about the answers they generate, a feature that can boost user trust. Yet, research shows that these confidence statements are often misaligned with factual correctness—the models sound sure even when they’re wrong. This paper introduces TracVC, a technique that traces a model’s verbalized confidence back to the specific training examples that influenced it, shedding light on why LLMs sometimes over‑confidently “talk the talk” without “walking the walk.”

Key Contributions

TracVC framework: Combines information retrieval with influence estimation to map a model’s confidence expression to the most influential training passages.
Content groundness metric: A novel evaluation that quantifies how much a confidence statement is rooted in content‑relevant examples versus generic “confidence‑talk” snippets.
Empirical study on OLMo and LLaMA: Demonstrates that a 13‑B‑parameter OLMo model often draws on lexically unrelated confidence‑related data, indicating superficial mimicry rather than true grounding.
Insight into training dynamics: Highlights a systemic issue where current pre‑training pipelines teach LLMs how to sound confident, not when confidence is warranted.

Methodology

Data Retrieval: For each generated answer‑confidence pair, the authors retrieve a set of candidate training passages using a dense vector search (e.g., FAISS) over the original pre‑training corpus.
Influence Estimation: They apply a gradient‑based influence function (similar to Koh & Liang, 2017) to estimate how much each retrieved passage contributed to the model’s confidence token logits.
Scoring Groundness:
- Content‑related passages contain factual information about the question/answer.
- Generic passages are merely examples of confidence phrasing (e.g., “I’m quite sure”).
  The content groundness score is the proportion of total influence coming from content‑related passages.
Evaluation: The pipeline is run on a benchmark QA set, comparing OLMo‑2‑13B and LLaMA‑2‑13B.

Results & Findings

Low content groundness for OLMo‑2‑13B: On average, only ≈30 % of the influence on confidence statements came from content‑relevant examples; the rest stemmed from generic confidence expressions.
Higher groundness for LLaMA‑2‑13B: LLaMA showed a more balanced split (~55 % content‑related), suggesting it relies more on factual context when expressing confidence.
Lexical mismatch: Many of the top‑influencing passages for OLMo were lexically unrelated to the query, indicating the model copies confidence phrasing patterns without grounding them in the answer’s substance.
Over‑confidence patterns: Cases where the model was factually wrong but still expressed high confidence correlated with high influence from generic confidence data.

Practical Implications

Better UI/UX for AI assistants: Knowing whether a confidence statement is truly grounded can inform when to display it to users, reducing the risk of misplaced trust.
Fine‑tuning strategies: Developers can augment training data with paired factual content + calibrated confidence annotations, encouraging models to learn when to be confident.
Monitoring & debugging: TracVC can be integrated into model‑serving pipelines to flag answers whose confidence is driven mainly by generic data, prompting fallback mechanisms (e.g., “I’m not sure”).
Regulatory compliance: For high‑stakes domains (healthcare, finance), demonstrating content‑grounded confidence could become a compliance requirement; TracVC offers a measurable audit trail.

Limitations & Future Work

Scalability: Influence estimation on billions of tokens remains computationally heavy; approximations may miss subtle influences.
Training data access: The method assumes access to the original pre‑training corpus, which is often proprietary for commercial LLMs.
Metric scope: Content groundness captures lexical relevance but may overlook nuanced reasoning steps that are not directly quoted in the training data.
Future directions:
- Develop lightweight influence proxies (e.g., using attention roll‑outs).
- Explore curriculum learning that explicitly teaches confidence calibration.
- Extend TracVC to multimodal models and instruction‑tuned variants.

Bottom line: TracVC shines a light on a hidden blind spot—LLMs can be trained to sound confident without being justified. By tracing confidence back to its training roots, developers gain a practical tool to build more trustworthy AI systems that not only answer correctly but also know when to admit uncertainty.

Authors

Yuxi Xia
Loris Schoenegger
Benjamin Roth

Paper Information

arXiv ID: 2601.10645v1
Categories: cs.CL
Published: January 15, 2026
PDF: Download PDF

[Paper] Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents