[Paper] Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking

Published: 2 months ago (November 28, 2025 at 12:21 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.23376v1

Overview

The paper investigates whether simply tailoring an AI‑assistant to a user’s expertise level (novice vs. expert) is enough to boost performance and satisfaction in task‑oriented settings. By building a “passively personalized” enterprise AI assistant and testing it with timed exams, the authors show that expertise‑based tweaks can lower perceived workload and improve how users view the assistant—yet they also uncover scenarios where more user control is needed.

Key Contributions

Passive expertise‑based personalization prototype: An AI‑assistant that automatically adapts its dialogue style and help level based on a user’s declared expertise.
Controlled user study in a high‑stakes task: Participants completed timed exams using either the passive‑personalized assistant or a baseline version with no expertise adaptation.
Empirical evidence of workload reduction: Passive personalization led to statistically significant drops in NASA‑TLX task‑load scores.
Improved assistant perception: Users rated the personalized assistant higher on trust, usefulness, and overall satisfaction.
Identification of task‑specific limits: Certain exam questions (e.g., requiring creative reasoning) exposed gaps where passive personalization alone could not compensate.
Design recommendation: A hybrid approach that blends passive (system‑driven) and active (user‑driven) personalization yields the best balance of efficiency and agency.

Methodology

System Design – The researchers built two versions of an enterprise AI‑assistant:
- Baseline: Uniform interaction style for all users.
- Passive‑Personalized: The system infers the user’s expertise from a short onboarding questionnaire and automatically adjusts response verbosity, hint granularity, and confidence phrasing.
Task Scenario – Participants (both self‑identified novices and experts) took a series of timed, multiple‑choice exams covering a domain‑specific knowledge set. The AI assistant could be queried for hints, explanations, or verification of answers.
Study Protocol – A within‑subjects design was used: each participant used both assistant versions on separate exam blocks, with order counter‑balanced to mitigate learning effects.
Metrics –
- Objective: Exam accuracy and completion time.
- Subjective: NASA‑TLX workload, System Usability Scale (SUS), and custom Likert items on trust and perceived usefulness.
Analysis – Paired t‑tests and mixed‑effects models examined differences across expertise levels and assistant conditions.

Results & Findings

Metric	Baseline	Passive‑Personalized	Effect
NASA‑TLX (overall workload)	58.2	45.7	↓ ≈ 22 % (p < 0.01)
SUS (usability)	71.4	78.9	↑ ≈ 10 % (p < 0.05)
Trust rating	3.8 / 5	4.3 / 5	↑ ≈ 13 % (p < 0.05)
Exam accuracy	78 %	80 %	ns (no significant gain)
Completion time	12.4 min	11.9 min	ns

Workload & perception: Participants felt the personalized assistant required less mental effort and was more trustworthy, especially novices.
Performance: Accuracy and speed showed modest, non‑significant improvements, suggesting that reduced workload does not automatically translate into higher scores in short, timed tests.
Task‑specific limits: For questions demanding higher‑order reasoning, the assistant’s static hint style sometimes over‑ or under‑supported users, leading to frustration.
User agency: When participants could override the hint level (an “active” control), satisfaction rose further, hinting that passive personalization alone is insufficient for complex tasks.

Practical Implications

Enterprise help desks & knowledge bases: Deploying passive expertise detection (e.g., via role tags or quick surveys) can make chat‑bots feel more attuned to users, lowering support tickets’ perceived difficulty.
Developer tooling: IDE assistants that auto‑adjust suggestion verbosity based on a developer’s experience could reduce cognitive load without requiring manual configuration.
E‑learning platforms: Adaptive tutoring bots that start with a passive expertise profile can improve learner confidence, but should expose controls for students to request more/less detail.
Product roadmap: Teams building task‑oriented conversational agents should plan for a dual‑personalization layer—initial passive adaptation followed by optional active toggles (e.g., “more hints”, “simplify language”).
Metrics to monitor: Beyond accuracy, track workload (NASA‑TLX or similar) and trust scores to gauge the real impact of personalization on user experience.

Limitations & Future Work

Scope of tasks: The study focused on multiple‑choice exams; results may differ for open‑ended or collaborative tasks.
Short‑term exposure: Participants interacted with the assistant for a single session; long‑term adaptation effects remain unknown.
Expertise inference: The current passive model relies on a self‑reported questionnaire; richer signals (e.g., interaction history, performance analytics) could improve accuracy.
Active personalization exploration: Future work should systematically compare pure passive, pure active, and hybrid approaches across diverse domains to refine guidelines for optimal agency balance.

Bottom line: Passive expertise‑based personalization can make AI assistants feel lighter and more trustworthy, but to truly maximize performance and user satisfaction, developers should give users a simple way to steer the assistant’s behavior when the task demands it.

Authors

Li Siyan
Jason Zhang
Akash Maharaj
Yuanming Shi
Yunyao Li

Paper Information

arXiv ID: 2511.23376v1
Categories: cs.HC, cs.CL
Published: November 28, 2025
PDF: Download PDF

[Paper] Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation

[Paper] Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization

[Paper] Optimizing Multimodal Language Models through Attention-based Interpretability

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] MegaChat: A Synthetic Persian Q&amp;A Dataset for High-Quality Sales Chatbot Evaluation

[Paper] Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization

[Paper] Optimizing Multimodal Language Models through Attention-based Interpretability

[Paper] MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation