[Paper] Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial
Source: arXiv - 2605.03916v1
Overview
A recent randomized controlled trial examined whether “atomic fact‑checking” – breaking down an AI‑generated oncology treatment recommendation into discrete, source‑verified claims – can boost clinicians’ trust in large language model (LLM) decision‑support tools. The study found a dramatic increase in trust, suggesting a practical path to safer, more adoptable AI assistants in cancer care.
Key Contributions
- Introduced atomic fact‑checking as a granular explainability technique that links each claim in an LLM recommendation to the exact guideline or evidence source.
- Ran a large‑scale RCT with 356 oncology clinicians who evaluated 7,476 AI‑generated recommendations across multiple trust‑rating scenarios.
- Quantified trust gains: atomic fact‑checking yielded a Cohen’s d of 0.94 (large effect), raising clinician trust from 27 % to 66 %.
- Compared against traditional transparency (e.g., citation lists, confidence scores), which showed modest improvements (d = 0.25–0.50).
- Provided a reproducible evaluation framework for future AI‑clinician interaction studies.
Methodology
- Participants & Setting – 356 board‑certified oncologists and oncology fellows from several hospitals were recruited online.
- AI System – A state‑of‑the‑art LLM was fine‑tuned on oncology guidelines (e.g., NCCN, ESMO) to generate treatment recommendations for a set of realistic patient cases.
- Explainability Conditions – Each recommendation was presented under one of four conditions:
- Baseline – plain text recommendation, no explanation.
- Citation List – a bibliography of guideline documents.
- Confidence Scores – per‑recommendation probability estimates.
- Atomic Fact‑Checking – the recommendation split into atomic statements, each accompanied by a direct link to the exact guideline paragraph that supports it.
- Randomization – Cases and explainability conditions were randomly assigned per clinician to avoid learning effects.
- Trust Measurement – After each recommendation, clinicians rated their trust on a 5‑point Likert scale and indicated a binary “trust / don’t trust” decision.
- Statistical Analysis – Effect sizes (Cohen’s d) and proportion differences were computed; mixed‑effects models accounted for repeated measures per participant.
Results & Findings
- Trust Boost – Atomic fact‑checking raised the “trust” proportion from 26.9 % (baseline) to 66.5 %, a Δ = 39.6 % absolute increase.
- Effect Size – Cohen’s d = 0.94 (large) for atomic fact‑checking, versus 0.25–0.50 for traditional transparency methods.
- Consistency Across Sub‑groups – The trust uplift held for both senior oncologists and trainees, and across tumor types (solid vs. hematologic).
- Decision Quality – While the study focused on trust, a secondary analysis showed no degradation in recommendation accuracy when atomic fact‑checking was used.
- User Feedback – Clinicians reported that being able to “click through” each claim to the original guideline made the AI feel more like a “trusted colleague” rather than a black‑box.
Practical Implications
- Design Blueprint – AI vendors building clinical decision‑support tools can adopt atomic fact‑checking as a default UI pattern: present each recommendation as a list of verifiable statements with direct guideline links.
- Regulatory Alignment – The approach satisfies emerging “explainability‑by‑design” requirements from bodies like the FDA and EMA, potentially smoothing the path to market approval.
- Reduced Adoption Friction – Higher trust translates to greater willingness to incorporate AI suggestions into multidisciplinary tumor boards, speeding up workflow and standardizing care.
- Extensible Beyond Oncology – The same atomic decomposition can be applied to other high‑stakes domains (e.g., cardiology, radiology) where guideline‑based practice dominates.
- Training & Education – Medical schools can use atomic fact‑checking interfaces to teach evidence‑based reasoning, turning AI into an interactive learning aid.
Limitations & Future Work
- Scope of Guidelines – The trial used only a subset of widely adopted oncology guidelines; performance with less‑structured evidence (e.g., real‑world data) remains unknown.
- Trust vs. Outcome – The study measured trust, not actual patient outcomes; future work should link increased trust to clinical efficacy and safety metrics.
- Scalability of Fact‑Checking – Generating and maintaining granular claim‑source mappings at scale may require automated citation extraction pipelines, which were not evaluated here.
- User Interface Variability – The experiment used a web‑based prototype; integration into EMR systems could introduce new usability challenges.
Bottom line: By turning every AI recommendation into a set of “look‑it‑up” facts, atomic fact‑checking dramatically improves clinicians’ confidence in LLM‑driven oncology support—paving the way for more trustworthy, regulator‑ready AI assistants in healthcare.*
Authors
- Lisa C. Adams
- Linus Marx
- Erik Thiele Orberg
- Keno Bressem
- Sebastian Ziegelmayer
- Denise Bernhardt
- Markus Graf
- Marcus R. Makowski
- Stephanie E. Combs
- Florian Matthes
- Jan C. Peeken
Paper Information
- arXiv ID: 2605.03916v1
- Categories: cs.CL, cs.AI
- Published: May 5, 2026
- PDF: Download PDF