[Paper] Rhetorical Questions in LLM Representations: A Linear Probing Study

Published: (April 15, 2026 at 01:50 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.14128v1

Overview

This paper investigates how large language models (LLMs) internally encode rhetorical questions—questions that aren’t meant to be answered but to persuade, signal stance, or shape a conversation. By probing the hidden states of LLMs with simple linear classifiers, the authors show that rhetorical cues emerge early in the model’s processing and can be reliably detected, even across different social‑media datasets.

Key Contributions

  • Linear probing framework for detecting rhetorical vs. information‑seeking questions in LLM hidden states.
  • Empirical evidence that rhetorical signals are most stable in the last‑token representation of a sequence.
  • Cross‑dataset transferability: probes trained on one corpus achieve AUROC ≈ 0.7–0.8 on another, indicating a shared but nuanced representation.
  • Multi‑directional encoding: different probes surface distinct rhetorical phenomena (discourse‑level stance vs. syntactic interrogative patterns), suggesting no single linear direction captures all rhetorical information.
  • Qualitative analysis linking probe‑specific rankings to concrete linguistic cues (e.g., extended argumentation vs. surface question forms).

Methodology

  1. Datasets – Two publicly available social‑media corpora containing both rhetorical and genuine information‑seeking questions, each with manually verified labels.
  2. Model checkpoints – Popular transformer‑based LLMs (e.g., GPT‑2, LLaMA) were run on the datasets; hidden states were extracted at every token.
  3. Linear probes – For each layer and token position, a logistic regression classifier was trained to separate rhetorical from non‑rhetorical questions using only the hidden vector (no fine‑tuning of the LLM).
  4. Evaluation – Probes were assessed with AUROC on held‑out data, and cross‑dataset transfer was measured by applying a probe trained on one corpus to the other.
  5. Ranking analysis – The top‑k instances according to each probe were compared to see how much they overlapped, revealing divergent focus areas.
  6. Qualitative inspection – Sample sentences from divergent rankings were manually examined to interpret the linguistic patterns each probe captured.

Results & Findings

AspectWhat the authors found
Layer & token positionRhetorical signals appear as early as the 3rd–4th transformer layer, but the last token (i.e., the final hidden state) consistently yields the highest AUROC.
SeparabilityWithin each dataset, rhetorical questions are linearly separable from information‑seeking ones (AUROC 0.78–0.84).
Cross‑dataset transferProbes transfer reasonably well (AUROC 0.70–0.80), but the ranking overlap of top‑k predictions is low (< 20 %).
Multiple linear directionsProbes trained on different corpora prioritize different cues: some focus on discourse‑level stance (e.g., sarcasm, argument continuation), others on surface syntax (e.g., presence of “why” or “how” without a following answer).
InterpretabilityQualitative examples confirm that the model encodes both high‑level pragmatic intent and low‑level syntactic patterns, each captured by a distinct linear direction.

Practical Implications

  • Content moderation & sentiment analysis – Detecting rhetorical questions can help platforms flag persuasive or manipulative language (e.g., political trolling) without misclassifying genuine user queries.
  • Chatbot design – Knowing that LLMs already embed rhetorical cues means developers can build lightweight classifiers to adjust response strategies (e.g., respond with acknowledgment rather than an answer).
  • Prompt engineering – When crafting prompts that involve rhetorical devices (e.g., “Isn’t this amazing?”), developers can anticipate that the model’s hidden states already carry that stance, enabling more nuanced control over downstream tasks like tone‑adjusted generation.
  • Transferable tooling – Since linear probes transfer across domains with decent AUROC, a single pre‑trained probe could be packaged as a plug‑and‑play module for any LLM‑based pipeline that needs rhetorical‑question detection.
  • Explainability dashboards – The multi‑directional nature of the encoding suggests that visualizing multiple probe scores (e.g., “discourse stance” vs. “syntactic interrogative”) could give developers richer insight into why a model treats a question as rhetorical.

Limitations & Future Work

  • Dataset scope – Only two social‑media corpora were examined; results may differ on formal text (news, academic writing) or other languages.
  • Linear probe simplicity – While informative, linear classifiers cannot capture non‑linear interactions that might also encode rhetorical intent.
  • Interpretation granularity – The study qualitatively links probe directions to linguistic phenomena, but a systematic taxonomy of rhetorical cues remains open.
  • Model diversity – Experiments focused on a handful of transformer checkpoints; extending to encoder‑only models (e.g., BERT) or newer instruction‑tuned LLMs could reveal different encoding patterns.
  • Application testing – Real‑world deployment (e.g., moderation pipelines) was not evaluated; future work could measure downstream impact on user experience and false‑positive rates.

Authors

  • Louie Hong Yao
  • Vishesh Anand
  • Yuan Zhuang
  • Tianyu Jiang

Paper Information

  • arXiv ID: 2604.14128v1
  • Categories: cs.CL, cs.AI, cs.LG
  • Published: April 15, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »