[Paper] Learning User Simulators with Turing Rewards

Published: 1 day ago (June 17, 2026 at 01:58 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.19336v1

Overview

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maximizing the log probability or by using a similarity reward. We instead propose {Turing-RL}: a Turing-Test-based reinforcement learning approach for training user simulator models. {Turing-RL} uses a discriminative Turing reward with an LLM judge to score how indistinguishable a generated response is from the real user’s given the user’s history, and the user simulator LLM learns to produce responses indistinguishable from what the user could have said with such rewards. Across two different domains—conversational chat and Reddit forum discussion—we find that {Turing-RL} consistently outperforms baseline methods on both LLM and human evaluation metrics. Our study suggests that optimizing for indistinguishability, rather than response matching, is effective for learning user simulators.

Key Contributions

This paper presents research in the following areas:

cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

Yingshan Susan Wang
Cedegao E. Zhang
Linlu Qiu
Zexue He
Pengyuan Li
Alex Pentland
Roger P. Levy
Yoon Kim

Paper Information

arXiv ID: 2606.19336v1
Categories: cs.CL
Published: June 17, 2026
PDF: Download PDF

[Paper] Learning User Simulators with Turing Rewards

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Native Active Perception as Reasoning for Omni-Modal Understanding

[Paper] Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

[Paper] Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

[Paper] Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play