[Paper] Bug Detective and Quality Coach: Developers' Mental Models of AI-Assisted IDE Tools
Source: arXiv - 2511.21197v1
Overview
The paper Bug Detective and Quality Coach investigates how developers think about AI‑assisted features inside their IDEs—specifically tools that flag bugs and assess code readability. By surfacing developers’ mental models, the authors reveal why trust, control, and adoption of these helpers often hinge on subtle design choices rather than raw technical performance.
Key Contributions
- Empirical insight: Six co‑design workshops with 58 professional developers uncovered two dominant mental models—bug detectives (critical‑issue alerts) and quality coaches (personalized readability guidance).
- Design taxonomy: A set of concrete design principles for Human‑Centered AI in IDEs, balancing disruption vs. support, brevity vs. depth, and automation vs. agency.
- Trust factors: Identification of the three pillars that drive trust for both tool types—clear explanations, appropriate timing, and user‑controlled interaction.
- Methodological blueprint: Demonstrates a scalable workshop‑based approach for eliciting mental models of AI tools from practitioners.
Methodology
The researchers ran six co‑design workshops (≈2 hours each) with developers from a mix of industries and experience levels. Participants were asked to:
- Sketch how they imagined an ideal AI bug‑detector or readability coach.
- Discuss scenarios where such tools would help or hinder their workflow.
- Prioritize features (e.g., explanation detail, notification timing, configurability).
The sessions were recorded, transcribed, and analyzed using thematic coding to surface recurring concepts and divergent expectations. This qualitative approach kept the focus on mental models—the internal representations developers hold about how the AI works and what it should do.
Results & Findings
| Aspect | Bug‑Detection Tools (“Bug Detectives”) | Readability Tools (“Quality Coaches”) |
|---|---|---|
| Core role | Warn only about critical defects; act as a safety net. | Offer continuous, contextual advice to improve style and maintainability. |
| Desired output | Concise, actionable alerts with confidence scores. | Progressive, personalized suggestions that adapt to the developer’s style. |
| Trust drivers | Transparent reasoning, clear severity ranking, ability to dismiss or snooze alerts. | Explainable rationale, timing that aligns with coding flow, fine‑grained control over suggestion granularity. |
| User control | “Turn on/off” per file/project; set severity thresholds. | Configurable coaching style (e.g., strict vs. lenient), ability to accept/reject suggestions individually. |
| Feedback loop | Immediate feedback on false positives improves trust. | Long‑term metrics (e.g., reduced cyclomatic complexity) reinforce perceived value. |
The authors distilled seven design principles, such as “Explain before you act,” “Let the developer stay in the driver’s seat,” and “Surface only what matters now.” These principles aim to prevent AI from becoming a noisy distraction while still delivering high‑value assistance.
Practical Implications
- IDE vendors can redesign their AI extensions to adopt the detective/coach metaphors, making UI language and visual cues align with developers’ expectations.
- Tool builders should prioritize explainability (e.g., inline rationale, confidence levels) and configurability (severity thresholds, coaching intensity) to boost adoption.
- Team leads can set policies that let developers calibrate AI assistance per project, reducing the “one‑size‑fits‑all” friction that often leads to tool abandonment.
- Continuous integration pipelines could integrate the “bug detective” mode to surface only show‑stopper issues, while the “quality coach” could be hooked into code‑review bots that provide style suggestions over time.
- Developer onboarding: New hires can be introduced to AI helpers as mentors rather than gatekeepers, smoothing the learning curve and fostering trust early on.
Limitations & Future Work
- Sample bias: All participants were recruited from a limited set of companies and may not represent the full spectrum of developer cultures (e.g., open‑source contributors, junior programmers).
- Workshop scope: The co‑design setting captures idealized expectations; real‑world usage may reveal additional friction points.
- Tool diversity: The study focused on generic bug‑detection and readability features; extending the framework to other AI‑assisted tasks (e.g., test generation, refactoring) remains open.
Future research directions include longitudinal field studies to validate whether the proposed design principles actually improve trust and productivity, and expanding the mental‑model framework to cover emerging AI capabilities like code synthesis and automated debugging.
Authors
- Paolo Buono
- Mary Cerullo
- Stefano Cirillo
- Giuseppe Desolda
- Francesco Greco
- Emanuela Guglielmi
- Grazia Margarella
- Giuseppe Polese
- Simone Scalabrino
- Cesare Tucci
Paper Information
- arXiv ID: 2511.21197v1
- Categories: cs.SE, cs.HC
- Published: November 26, 2025
- PDF: Download PDF