[Paper] Rhetorical Questions in LLM Representations: A Linear Probing Study

Published: 3 weeks ago (April 15, 2026 at 01:50 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.14128v1

Overview

This paper investigates how large language models (LLMs) internally encode rhetorical questions—questions that aren’t meant to be answered but to persuade, signal stance, or shape a conversation. By probing the hidden states of LLMs with simple linear classifiers, the authors show that rhetorical cues emerge early in the model’s processing and can be reliably detected, even across different social‑media datasets.

Key Contributions

Linear probing framework for detecting rhetorical vs. information‑seeking questions in LLM hidden states.
Empirical evidence that rhetorical signals are most stable in the last‑token representation of a sequence.
Cross‑dataset transferability: probes trained on one corpus achieve AUROC ≈ 0.7–0.8 on another, indicating a shared but nuanced representation.
Multi‑directional encoding: different probes surface distinct rhetorical phenomena (discourse‑level stance vs. syntactic interrogative patterns), suggesting no single linear direction captures all rhetorical information.
Qualitative analysis linking probe‑specific rankings to concrete linguistic cues (e.g., extended argumentation vs. surface question forms).

Methodology

Datasets – Two publicly available social‑media corpora containing both rhetorical and genuine information‑seeking questions, each with manually verified labels.
Model checkpoints – Popular transformer‑based LLMs (e.g., GPT‑2, LLaMA) were run on the datasets; hidden states were extracted at every token.
Linear probes – For each layer and token position, a logistic regression classifier was trained to separate rhetorical from non‑rhetorical questions using only the hidden vector (no fine‑tuning of the LLM).
Evaluation – Probes were assessed with AUROC on held‑out data, and cross‑dataset transfer was measured by applying a probe trained on one corpus to the other.
Ranking analysis – The top‑k instances according to each probe were compared to see how much they overlapped, revealing divergent focus areas.
Qualitative inspection – Sample sentences from divergent rankings were manually examined to interpret the linguistic patterns each probe captured.

Results & Findings

Aspect	What the authors found
Layer & token position	Rhetorical signals appear as early as the 3rd–4th transformer layer, but the last token (i.e., the final hidden state) consistently yields the highest AUROC.
Separability	Within each dataset, rhetorical questions are linearly separable from information‑seeking ones (AUROC 0.78–0.84).
Cross‑dataset transfer	Probes transfer reasonably well (AUROC 0.70–0.80), but the ranking overlap of top‑k predictions is low (< 20 %).
Multiple linear directions	Probes trained on different corpora prioritize different cues: some focus on discourse‑level stance (e.g., sarcasm, argument continuation), others on surface syntax (e.g., presence of “why” or “how” without a following answer).
Interpretability	Qualitative examples confirm that the model encodes both high‑level pragmatic intent and low‑level syntactic patterns, each captured by a distinct linear direction.

Practical Implications

Content moderation & sentiment analysis – Detecting rhetorical questions can help platforms flag persuasive or manipulative language (e.g., political trolling) without misclassifying genuine user queries.
Chatbot design – Knowing that LLMs already embed rhetorical cues means developers can build lightweight classifiers to adjust response strategies (e.g., respond with acknowledgment rather than an answer).
Prompt engineering – When crafting prompts that involve rhetorical devices (e.g., “Isn’t this amazing?”), developers can anticipate that the model’s hidden states already carry that stance, enabling more nuanced control over downstream tasks like tone‑adjusted generation.
Transferable tooling – Since linear probes transfer across domains with decent AUROC, a single pre‑trained probe could be packaged as a plug‑and‑play module for any LLM‑based pipeline that needs rhetorical‑question detection.
Explainability dashboards – The multi‑directional nature of the encoding suggests that visualizing multiple probe scores (e.g., “discourse stance” vs. “syntactic interrogative”) could give developers richer insight into why a model treats a question as rhetorical.

Limitations & Future Work

Dataset scope – Only two social‑media corpora were examined; results may differ on formal text (news, academic writing) or other languages.
Linear probe simplicity – While informative, linear classifiers cannot capture non‑linear interactions that might also encode rhetorical intent.
Interpretation granularity – The study qualitatively links probe directions to linguistic phenomena, but a systematic taxonomy of rhetorical cues remains open.
Model diversity – Experiments focused on a handful of transformer checkpoints; extending to encoder‑only models (e.g., BERT) or newer instruction‑tuned LLMs could reveal different encoding patterns.
Application testing – Real‑world deployment (e.g., moderation pipelines) was not evaluated; future work could measure downstream impact on user experience and false‑positive rates.

Authors

Louie Hong Yao
Vishesh Anand
Yuan Zhuang
Tianyu Jiang

Paper Information

arXiv ID: 2604.14128v1
Categories: cs.CL, cs.AI, cs.LG
Published: April 15, 2026
PDF: Download PDF

[Paper] Rhetorical Questions in LLM Representations: A Linear Probing Study

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Learning to Reason with Insight for Informal Theorem Proving

[Paper] VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

[Paper] From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text

[Paper] Detecting and Suppressing Reward Hacking with Gradient Fingerprints