[Paper] Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts

Published: 2 months ago (December 9, 2025 at 12:07 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.08814v1

Overview

The paper introduces ROME, a new framework that uses large language models (LLMs) to “role‑play” as a test‑taker on classic personality questionnaires (e.g., MBTI, Big‑5). By turning a user’s raw social‑media posts into simulated answers to validated psychometric items, ROME creates a transparent bridge between noisy text and abstract personality labels, dramatically improving detection accuracy while keeping the reasoning process interpretable for developers.

Key Contributions

Psychology‑aware prompting: Leverages LLMs’ ability to answer questionnaire items, injecting domain‑specific knowledge directly into the model.
Question‑conditioned Mixture‑of‑Experts (MoE): A lightweight routing module that jointly processes the original post and the generated question context, learning to predict questionnaire answers as an auxiliary task.
Answer‑vector supervision: Converts the LLM‑generated answers into a structured “answer vector” that serves as rich intermediate supervision, alleviating the scarcity of labeled personality data.
Multi‑task learning pipeline: Simultaneously trains on answer prediction and final personality classification, yielding a more robust end‑to‑end system.
Strong empirical gains: Shows up to 15.4 % relative improvement over the best prior methods on a public Kaggle personality dataset and consistent gains on a second benchmark.

Methodology

Data Preparation – Each user’s collection of posts is paired with a set of psychometric questions (e.g., “I enjoy social gatherings”).
Role‑Playing LLM – A pre‑trained LLM (e.g., GPT‑3.5) is prompted to answer each question as if it were the user, using the user’s posts as context. This yields a question‑level answer (typically a Likert‑scale score).
Question‑Conditioned MoE
- The post text is encoded with a transformer encoder.
- Each question is also embedded (via the same LLM or a smaller encoder).
- A gating network decides which expert (a small feed‑forward sub‑network) should handle the interaction between a specific post and question pair.
Answer Vector Construction – All predicted answers are concatenated into a fixed‑size vector that directly mirrors the structure of the original questionnaire.
Multi‑Task Objective – Two losses are optimized together:
- Answer prediction loss (supervised by the questionnaire’s ground‑truth answers on a small labeled subset).
- Personality classification loss (standard cross‑entropy on MBTI/Big‑5 labels).
The auxiliary answer task forces the model to learn psychologically meaningful representations, which in turn improves the final label prediction.

Results & Findings

Dataset	Baseline (SOTA)	ROME (ours)	Relative ↑
Kaggle MBTI (≈ 10k users)	71.2 % accuracy	81.9 %	15.4 %
Reddit Big‑5 (≈ 5k users)	63.5 % F1	74.1 %	16.8 %

Interpretability: The answer vectors expose which questionnaire items drove a particular personality prediction, a feature absent in black‑box text‑only models.
Data Efficiency: With only 5 % of the training data labeled, ROME still outperforms baselines trained on the full set, confirming the power of the auxiliary supervision.
Ablation: Removing the MoE routing or the answer‑prediction task drops performance by ~7 %, highlighting their complementary roles.

Practical Implications

Personalized UX: Developers can integrate ROME into recommendation engines, chatbots, or adaptive UI systems to infer user traits from existing interaction logs without needing explicit questionnaire responses.
Mental‑Health Tools: Clinicians can use the answer vectors as a first‑line screening aid, flagging users whose generated answers suggest risk factors (e.g., high neuroticism).
Compliance & Transparency: Because the model outputs human‑readable questionnaire scores, it satisfies emerging AI‑explainability regulations better than opaque embeddings.
Low‑Label Scenarios: Start‑ups with limited annotated personality data can bootstrap a high‑performing detector by fine‑tuning a generic LLM with ROME’s multi‑task setup.
Plug‑and‑Play: The MoE component is lightweight (≈ 2 M parameters) and can be attached to any existing transformer‑based text encoder, making migration to production straightforward.

Limitations & Future Work

Prompt Sensitivity: The quality of generated answers depends on prompt engineering; poorly crafted prompts can introduce bias.
Questionnaire Coverage: ROME currently assumes a fixed set of psychometric items; extending to other personality models (e.g., HEXACO) requires additional prompt design and modest retraining.
Scalability of LLM Inference: Real‑time role‑playing with large LLMs may be costly; future work could explore distilled or adapter‑based LLMs to reduce latency.
Cross‑Cultural Validity: The psychometric questions are primarily English‑centric; evaluating ROME on multilingual or culturally diverse corpora is an open direction.

Bottom line: ROME demonstrates that marrying LLMs’ generative strengths with classic psychological assessments yields a more data‑efficient, interpretable, and accurate personality detection pipeline—an approach that developers can adopt today to build smarter, user‑centric applications.

Authors

Yifan Lyu
Liang Zhang

Paper Information

arXiv ID: 2512.08814v1
Categories: cs.CL
Published: December 9, 2025
PDF: Download PDF

[Paper] Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support

[Paper] From Signal to Turn: Interactional Friction in Modular Speech-to-Speech Pipelines

[Paper] Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks

[Paper] Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling