[Paper] Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

Published: (June 11, 2026 at 12:27 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.13544v1

Overview

Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing voice agent that conditions turn-taking behavior on an explicitly assigned role in multi-party settings. The system is built on a speech large language model operating in chunk-wise streaming manner. We further introduce a reasoning-augmented variant that incorporates chain-of-thought reasoning over conversational context and the assigned role. We construct RolePlayConv, a large-scale synthetic dataset of spoken multi-party conversations with diverse assistant roles. Experiments on real-world meeting data and RolePlayConv show improved turn-taking precision by over 40% and recall by more than 70%, while substantially reducing false-positive interruptions compared to non-role-conditioned baselines.

Key Contributions

This paper presents research in the following areas:

  • eess.AS
  • cs.AI
  • cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of eess.AS.

Authors

  • Soumyajit Mitra
  • Prabhat Pandey
  • Abhinav Jain
  • Shanmukha Sahith
  • K V Vijay Girish

Paper Information

  • arXiv ID: 2606.13544v1
  • Categories: eess.AS, cs.AI, cs.CL
  • Published: June 11, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »