[Paper] AuRA: Internalizing Audio Understanding into LLMs as LoRA

Published: (June 9, 2026 at 12:05 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.11033v1

Overview

Recent efforts to extend large language models (LLMs) to speech inputs typically rely on cascaded ASR-LLM pipelines, end-to-end speech-language models, or bridge/distillation-based adaptation. While these routes respectively reuse strong pretrained components, enable native speech-language interaction, or offer lightweight adaptation, they often suffer from transcript-interface latency, costly multimodal training, or sequential speech-language coupling. To address these limitations, we present AuRA, a method that distills audio encoding capability into the LLM. Specifically, AuRA feeds the same speech input to an ASR encoder (as a teacher) and a LoRA-adapted LLM (as a student) through a lightweight audio embedding layer, and uses layer-wise distillation to align the student’s hidden states with corresponding teacher representations, thereby internalizing speech representations into lightweight LLM-side adaptations. Compared with cascaded and serial bridge methods, AuRA enables tighter speech-language joint modeling and efficient parallel end-to-end inference, while also reusing pretrained speech and language models rather than requiring large-scale multimodal training. On multiple speech-language benchmarks, AuRA consistently outperforms cascaded systems, speech-to-LLM adaptation baselines, and large-scale speech-language and multimodal models in both effectiveness and efficiency.

Key Contributions

This paper presents research in the following areas:

  • cs.LG
  • cs.AI
  • cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.LG.

Authors

  • Bo Cheng
  • Lei Shi
  • Zhanyu Ma
  • Yuan Wu
  • Jun Xu
  • Jiuchong Gao
  • Jinghua Hao
  • Renqing He

Paper Information

  • arXiv ID: 2606.11033v1
  • Categories: cs.LG, cs.AI, cs.CL
  • Published: June 9, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »