[Paper] Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs
Source: arXiv - 2601.10645v1
Overview
Large language models (LLMs) are increasingly asked to state how confident they are about the answers they generate, a feature that can boost user trust. Yet, research shows that these confidence statements are often misaligned with factual correctness—the models sound sure even when they’re wrong. This paper introduces TracVC, a technique that traces a model’s verbalized confidence back to the specific training examples that influenced it, shedding light on why LLMs sometimes over‑confidently “talk the talk” without “walking the walk.”
Key Contributions
- TracVC framework: Combines information retrieval with influence estimation to map a model’s confidence expression to the most influential training passages.
- Content groundness metric: A novel evaluation that quantifies how much a confidence statement is rooted in content‑relevant examples versus generic “confidence‑talk” snippets.
- Empirical study on OLMo and LLaMA: Demonstrates that a 13‑B‑parameter OLMo model often draws on lexically unrelated confidence‑related data, indicating superficial mimicry rather than true grounding.
- Insight into training dynamics: Highlights a systemic issue where current pre‑training pipelines teach LLMs how to sound confident, not when confidence is warranted.
Methodology
- Data Retrieval: For each generated answer‑confidence pair, the authors retrieve a set of candidate training passages using a dense vector search (e.g., FAISS) over the original pre‑training corpus.
- Influence Estimation: They apply a gradient‑based influence function (similar to Koh & Liang, 2017) to estimate how much each retrieved passage contributed to the model’s confidence token logits.
- Scoring Groundness:
- Content‑related passages contain factual information about the question/answer.
- Generic passages are merely examples of confidence phrasing (e.g., “I’m quite sure”).
The content groundness score is the proportion of total influence coming from content‑related passages.
- Evaluation: The pipeline is run on a benchmark QA set, comparing OLMo‑2‑13B and LLaMA‑2‑13B.
Results & Findings
- Low content groundness for OLMo‑2‑13B: On average, only ≈30 % of the influence on confidence statements came from content‑relevant examples; the rest stemmed from generic confidence expressions.
- Higher groundness for LLaMA‑2‑13B: LLaMA showed a more balanced split (~55 % content‑related), suggesting it relies more on factual context when expressing confidence.
- Lexical mismatch: Many of the top‑influencing passages for OLMo were lexically unrelated to the query, indicating the model copies confidence phrasing patterns without grounding them in the answer’s substance.
- Over‑confidence patterns: Cases where the model was factually wrong but still expressed high confidence correlated with high influence from generic confidence data.
Practical Implications
- Better UI/UX for AI assistants: Knowing whether a confidence statement is truly grounded can inform when to display it to users, reducing the risk of misplaced trust.
- Fine‑tuning strategies: Developers can augment training data with paired factual content + calibrated confidence annotations, encouraging models to learn when to be confident.
- Monitoring & debugging: TracVC can be integrated into model‑serving pipelines to flag answers whose confidence is driven mainly by generic data, prompting fallback mechanisms (e.g., “I’m not sure”).
- Regulatory compliance: For high‑stakes domains (healthcare, finance), demonstrating content‑grounded confidence could become a compliance requirement; TracVC offers a measurable audit trail.
Limitations & Future Work
- Scalability: Influence estimation on billions of tokens remains computationally heavy; approximations may miss subtle influences.
- Training data access: The method assumes access to the original pre‑training corpus, which is often proprietary for commercial LLMs.
- Metric scope: Content groundness captures lexical relevance but may overlook nuanced reasoning steps that are not directly quoted in the training data.
- Future directions:
- Develop lightweight influence proxies (e.g., using attention roll‑outs).
- Explore curriculum learning that explicitly teaches confidence calibration.
- Extend TracVC to multimodal models and instruction‑tuned variants.
Bottom line: TracVC shines a light on a hidden blind spot—LLMs can be trained to sound confident without being justified. By tracing confidence back to its training roots, developers gain a practical tool to build more trustworthy AI systems that not only answer correctly but also know when to admit uncertainty.
Authors
- Yuxi Xia
- Loris Schoenegger
- Benjamin Roth
Paper Information
- arXiv ID: 2601.10645v1
- Categories: cs.CL
- Published: January 15, 2026
- PDF: Download PDF