[Paper] Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Published: 5 days ago (June 5, 2026 at 01:26 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.07473v1

Overview

Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper’s internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.

Key Contributions

This paper presents research in the following areas:

cs.SD
cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.SD.

Authors

Georgii Aparin
Vadim Popov
Tasnima Sadekova
Assel Yermekova

Paper Information

arXiv ID: 2606.07473v1
Categories: cs.SD, cs.AI
Published: June 5, 2026
PDF: Download PDF

[Paper] Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] How reliable are LLMs when it comes to playing dice?

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

[Paper] Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization