[Paper] Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs

Published: (January 15, 2026 at 01:05 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.10645v1

Overview

Large language models (LLMs) are increasingly asked to state how confident they are about the answers they generate, a feature that can boost user trust. Yet, research shows that these confidence statements are often misaligned with factual correctness—the models sound sure even when they’re wrong. This paper introduces TracVC, a technique that traces a model’s verbalized confidence back to the specific training examples that influenced it, shedding light on why LLMs sometimes over‑confidently “talk the talk” without “walking the walk.”

Key Contributions

  • TracVC framework: Combines information retrieval with influence estimation to map a model’s confidence expression to the most influential training passages.
  • Content groundness metric: A novel evaluation that quantifies how much a confidence statement is rooted in content‑relevant examples versus generic “confidence‑talk” snippets.
  • Empirical study on OLMo and LLaMA: Demonstrates that a 13‑B‑parameter OLMo model often draws on lexically unrelated confidence‑related data, indicating superficial mimicry rather than true grounding.
  • Insight into training dynamics: Highlights a systemic issue where current pre‑training pipelines teach LLMs how to sound confident, not when confidence is warranted.

Methodology

  1. Data Retrieval: For each generated answer‑confidence pair, the authors retrieve a set of candidate training passages using a dense vector search (e.g., FAISS) over the original pre‑training corpus.
  2. Influence Estimation: They apply a gradient‑based influence function (similar to Koh & Liang, 2017) to estimate how much each retrieved passage contributed to the model’s confidence token logits.
  3. Scoring Groundness:
    • Content‑related passages contain factual information about the question/answer.
    • Generic passages are merely examples of confidence phrasing (e.g., “I’m quite sure”).
      The content groundness score is the proportion of total influence coming from content‑related passages.
  4. Evaluation: The pipeline is run on a benchmark QA set, comparing OLMo‑2‑13B and LLaMA‑2‑13B.

Results & Findings

  • Low content groundness for OLMo‑2‑13B: On average, only ≈30 % of the influence on confidence statements came from content‑relevant examples; the rest stemmed from generic confidence expressions.
  • Higher groundness for LLaMA‑2‑13B: LLaMA showed a more balanced split (~55 % content‑related), suggesting it relies more on factual context when expressing confidence.
  • Lexical mismatch: Many of the top‑influencing passages for OLMo were lexically unrelated to the query, indicating the model copies confidence phrasing patterns without grounding them in the answer’s substance.
  • Over‑confidence patterns: Cases where the model was factually wrong but still expressed high confidence correlated with high influence from generic confidence data.

Practical Implications

  • Better UI/UX for AI assistants: Knowing whether a confidence statement is truly grounded can inform when to display it to users, reducing the risk of misplaced trust.
  • Fine‑tuning strategies: Developers can augment training data with paired factual content + calibrated confidence annotations, encouraging models to learn when to be confident.
  • Monitoring & debugging: TracVC can be integrated into model‑serving pipelines to flag answers whose confidence is driven mainly by generic data, prompting fallback mechanisms (e.g., “I’m not sure”).
  • Regulatory compliance: For high‑stakes domains (healthcare, finance), demonstrating content‑grounded confidence could become a compliance requirement; TracVC offers a measurable audit trail.

Limitations & Future Work

  • Scalability: Influence estimation on billions of tokens remains computationally heavy; approximations may miss subtle influences.
  • Training data access: The method assumes access to the original pre‑training corpus, which is often proprietary for commercial LLMs.
  • Metric scope: Content groundness captures lexical relevance but may overlook nuanced reasoning steps that are not directly quoted in the training data.
  • Future directions:
    • Develop lightweight influence proxies (e.g., using attention roll‑outs).
    • Explore curriculum learning that explicitly teaches confidence calibration.
    • Extend TracVC to multimodal models and instruction‑tuned variants.

Bottom line: TracVC shines a light on a hidden blind spot—LLMs can be trained to sound confident without being justified. By tracing confidence back to its training roots, developers gain a practical tool to build more trustworthy AI systems that not only answer correctly but also know when to admit uncertainty.

Authors

  • Yuxi Xia
  • Loris Schoenegger
  • Benjamin Roth

Paper Information

  • arXiv ID: 2601.10645v1
  • Categories: cs.CL
  • Published: January 15, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »