[Paper] Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking
Source: arXiv - 2511.23376v1
Overview
The paper investigates whether simply tailoring an AI‑assistant to a user’s expertise level (novice vs. expert) is enough to boost performance and satisfaction in task‑oriented settings. By building a “passively personalized” enterprise AI assistant and testing it with timed exams, the authors show that expertise‑based tweaks can lower perceived workload and improve how users view the assistant—yet they also uncover scenarios where more user control is needed.
Key Contributions
- Passive expertise‑based personalization prototype: An AI‑assistant that automatically adapts its dialogue style and help level based on a user’s declared expertise.
- Controlled user study in a high‑stakes task: Participants completed timed exams using either the passive‑personalized assistant or a baseline version with no expertise adaptation.
- Empirical evidence of workload reduction: Passive personalization led to statistically significant drops in NASA‑TLX task‑load scores.
- Improved assistant perception: Users rated the personalized assistant higher on trust, usefulness, and overall satisfaction.
- Identification of task‑specific limits: Certain exam questions (e.g., requiring creative reasoning) exposed gaps where passive personalization alone could not compensate.
- Design recommendation: A hybrid approach that blends passive (system‑driven) and active (user‑driven) personalization yields the best balance of efficiency and agency.
Methodology
- System Design – The researchers built two versions of an enterprise AI‑assistant:
- Baseline: Uniform interaction style for all users.
- Passive‑Personalized: The system infers the user’s expertise from a short onboarding questionnaire and automatically adjusts response verbosity, hint granularity, and confidence phrasing.
- Task Scenario – Participants (both self‑identified novices and experts) took a series of timed, multiple‑choice exams covering a domain‑specific knowledge set. The AI assistant could be queried for hints, explanations, or verification of answers.
- Study Protocol – A within‑subjects design was used: each participant used both assistant versions on separate exam blocks, with order counter‑balanced to mitigate learning effects.
- Metrics –
- Objective: Exam accuracy and completion time.
- Subjective: NASA‑TLX workload, System Usability Scale (SUS), and custom Likert items on trust and perceived usefulness.
- Analysis – Paired t‑tests and mixed‑effects models examined differences across expertise levels and assistant conditions.
Results & Findings
| Metric | Baseline | Passive‑Personalized | Effect |
|---|---|---|---|
| NASA‑TLX (overall workload) | 58.2 | 45.7 | ↓ ≈ 22 % (p < 0.01) |
| SUS (usability) | 71.4 | 78.9 | ↑ ≈ 10 % (p < 0.05) |
| Trust rating | 3.8 / 5 | 4.3 / 5 | ↑ ≈ 13 % (p < 0.05) |
| Exam accuracy | 78 % | 80 % | ns (no significant gain) |
| Completion time | 12.4 min | 11.9 min | ns |
- Workload & perception: Participants felt the personalized assistant required less mental effort and was more trustworthy, especially novices.
- Performance: Accuracy and speed showed modest, non‑significant improvements, suggesting that reduced workload does not automatically translate into higher scores in short, timed tests.
- Task‑specific limits: For questions demanding higher‑order reasoning, the assistant’s static hint style sometimes over‑ or under‑supported users, leading to frustration.
- User agency: When participants could override the hint level (an “active” control), satisfaction rose further, hinting that passive personalization alone is insufficient for complex tasks.
Practical Implications
- Enterprise help desks & knowledge bases: Deploying passive expertise detection (e.g., via role tags or quick surveys) can make chat‑bots feel more attuned to users, lowering support tickets’ perceived difficulty.
- Developer tooling: IDE assistants that auto‑adjust suggestion verbosity based on a developer’s experience could reduce cognitive load without requiring manual configuration.
- E‑learning platforms: Adaptive tutoring bots that start with a passive expertise profile can improve learner confidence, but should expose controls for students to request more/less detail.
- Product roadmap: Teams building task‑oriented conversational agents should plan for a dual‑personalization layer—initial passive adaptation followed by optional active toggles (e.g., “more hints”, “simplify language”).
- Metrics to monitor: Beyond accuracy, track workload (NASA‑TLX or similar) and trust scores to gauge the real impact of personalization on user experience.
Limitations & Future Work
- Scope of tasks: The study focused on multiple‑choice exams; results may differ for open‑ended or collaborative tasks.
- Short‑term exposure: Participants interacted with the assistant for a single session; long‑term adaptation effects remain unknown.
- Expertise inference: The current passive model relies on a self‑reported questionnaire; richer signals (e.g., interaction history, performance analytics) could improve accuracy.
- Active personalization exploration: Future work should systematically compare pure passive, pure active, and hybrid approaches across diverse domains to refine guidelines for optimal agency balance.
Bottom line: Passive expertise‑based personalization can make AI assistants feel lighter and more trustworthy, but to truly maximize performance and user satisfaction, developers should give users a simple way to steer the assistant’s behavior when the task demands it.
Authors
- Li Siyan
- Jason Zhang
- Akash Maharaj
- Yuanming Shi
- Yunyao Li
Paper Information
- arXiv ID: 2511.23376v1
- Categories: cs.HC, cs.CL
- Published: November 28, 2025
- PDF: Download PDF