[Paper] What Am I Missing? Question-Answering as Hidden State Probing

Published: 1 week ago (May 29, 2026 at 01:27 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.31561v1

Overview

The paper investigates how large language models (LLMs) reason at inference time by treating the act of asking a follow‑up question as a window into the model’s hidden state. By training a lightweight probe on the student model’s internal representations before and after it generates a question, the authors show that these embeddings already contain a strong signal about whether the final answer will be correct—even before the teacher model replies. This opens the door to using self‑generated questions as a diagnostic tool for improving answer quality.

Key Contributions

Hidden‑state probing for self‑diagnosis: Demonstrates that a simple classifier can predict the eventual correctness of a reasoning trajectory from the student’s hidden state surrounding question generation.
Student‑teacher probing framework: Introduces a novel setup where a “student” model asks clarifying questions to a “teacher” model, allowing the study of information flow during chain‑of‑thought reasoning.
Sequential decision formulation: Casts question‑asking as a gating policy that decides, at each step, whether to ask a question to maximize the chance of a correct final answer.
Empirical analysis of self‑consistency: Shows that the effectiveness of question‑asking hinges on the model’s intrinsic self‑consistency, revealing a gap between detecting errors and actually fixing them.
Insight into self‑refinement limits: Highlights that current LLMs can diagnose uncertainty well but often fail to recover from mistakes when intervened upon.

Methodology

Student‑Teacher Interaction:
- The student receives a problem prompt and may generate a clarifying question.
- The teacher (a larger or more capable LLM) answers the question, after which the student continues toward a final answer.
Hidden‑State Probe:
- A shallow neural probe (e.g., a linear classifier) is trained on the student’s hidden vectors before and after question generation.
- Labels are binary: trajectory ends correct vs. trajectory ends incorrect.
Gating Policy:
- Using the probe’s confidence as a quality score, a policy decides whether to ask a question at a given step.
- The policy is optimized to maximize the probability that the final answer is correct, treating question‑asking as a sequential decision problem.
Evaluation:
- Experiments are run on standard reasoning benchmarks (e.g., GSM‑8K, MathQA).
- Metrics include detection accuracy (how well the probe predicts correctness) and overall answer accuracy after applying the gating policy.

Results & Findings

Aspect	Observation
Probe Predictive Power	The probe achieves >80 % accuracy in forecasting final correctness from the student’s hidden state before seeing the teacher’s answer.
Self‑Consistency Dependency	Models with higher self‑consistency (i.e., generating similar reasoning paths across samples) benefit more from the gating policy.
Diagnosis vs. Recovery Gap	While the gating policy reliably flags uncertain or incorrect trajectories, actually intervening (asking a question) improves correct answers only ~50 % of the time—roughly as often as it harms already‑correct answers.
Overall Accuracy	Applying the gating policy yields modest gains (≈2–3 % absolute) on benchmark scores, confirming that question‑asking can help but is not a silver bullet.

Practical Implications

Debug‑Friendly LLM APIs: Developers could expose a “diagnostic mode” where the model returns a confidence score derived from its hidden state, allowing downstream systems to decide whether to request clarification or fallback to a simpler heuristic.
Cost‑Effective Self‑Check: Instead of always invoking a larger teacher model, a lightweight probe can quickly estimate answer reliability, saving compute when the model is already confident.
Interactive QA Systems: Chatbots can be programmed to ask clarifying questions only when the probe signals high uncertainty, leading to more natural and efficient conversations.
Curriculum Design for Fine‑Tuning: Training pipelines might incorporate a probing head to encourage models to develop richer internal representations that are easier to diagnose, potentially improving robustness.

Limitations & Future Work

Intervention Effectiveness: The current gating policy does not consistently turn a wrong answer into a right one; more sophisticated question‑generation or teacher selection strategies are needed.
Model Size & Architecture Dependence: Results are reported on a limited set of LLM families; it remains unclear how well the approach scales to very large or multimodal models.
Probe Simplicity: A linear probe may miss deeper nuances; exploring richer probing architectures could improve diagnostic fidelity.
User‑Facing Transparency: Translating hidden‑state confidence into human‑readable explanations is still an open challenge.

Bottom line: This work shows that LLMs already “know” when they’re likely to go astray, as evidenced by their hidden states during question generation. Harnessing that self‑diagnostic signal could make AI systems more reliable, but turning diagnosis into correction will require smarter interventions and tighter integration between student and teacher models.

Authors

Chu Fei Luo
Samuel Dahan
Xiaodan Zhu

Paper Information

arXiv ID: 2605.31561v1
Categories: cs.CL
Published: May 29, 2026
PDF: Download PDF

[Paper] What Am I Missing? Question-Answering as Hidden State Probing

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions

[Paper] LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

[Paper] What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation

[Paper] Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection