[Paper] Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial

Published: 5 days ago (May 5, 2026 at 12:12 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.03916v1

Overview

A recent randomized controlled trial examined whether “atomic fact‑checking” – breaking down an AI‑generated oncology treatment recommendation into discrete, source‑verified claims – can boost clinicians’ trust in large language model (LLM) decision‑support tools. The study found a dramatic increase in trust, suggesting a practical path to safer, more adoptable AI assistants in cancer care.

Key Contributions

Introduced atomic fact‑checking as a granular explainability technique that links each claim in an LLM recommendation to the exact guideline or evidence source.
Ran a large‑scale RCT with 356 oncology clinicians who evaluated 7,476 AI‑generated recommendations across multiple trust‑rating scenarios.
Quantified trust gains: atomic fact‑checking yielded a Cohen’s d of 0.94 (large effect), raising clinician trust from 27 % to 66 %.
Compared against traditional transparency (e.g., citation lists, confidence scores), which showed modest improvements (d = 0.25–0.50).
Provided a reproducible evaluation framework for future AI‑clinician interaction studies.

Methodology

Participants & Setting – 356 board‑certified oncologists and oncology fellows from several hospitals were recruited online.
AI System – A state‑of‑the‑art LLM was fine‑tuned on oncology guidelines (e.g., NCCN, ESMO) to generate treatment recommendations for a set of realistic patient cases.
Explainability Conditions – Each recommendation was presented under one of four conditions:
- Baseline – plain text recommendation, no explanation.
- Citation List – a bibliography of guideline documents.
- Confidence Scores – per‑recommendation probability estimates.
- Atomic Fact‑Checking – the recommendation split into atomic statements, each accompanied by a direct link to the exact guideline paragraph that supports it.
Randomization – Cases and explainability conditions were randomly assigned per clinician to avoid learning effects.
Trust Measurement – After each recommendation, clinicians rated their trust on a 5‑point Likert scale and indicated a binary “trust / don’t trust” decision.
Statistical Analysis – Effect sizes (Cohen’s d) and proportion differences were computed; mixed‑effects models accounted for repeated measures per participant.

Results & Findings

Trust Boost – Atomic fact‑checking raised the “trust” proportion from 26.9 % (baseline) to 66.5 %, a Δ = 39.6 % absolute increase.
Effect Size – Cohen’s d = 0.94 (large) for atomic fact‑checking, versus 0.25–0.50 for traditional transparency methods.
Consistency Across Sub‑groups – The trust uplift held for both senior oncologists and trainees, and across tumor types (solid vs. hematologic).
Decision Quality – While the study focused on trust, a secondary analysis showed no degradation in recommendation accuracy when atomic fact‑checking was used.
User Feedback – Clinicians reported that being able to “click through” each claim to the original guideline made the AI feel more like a “trusted colleague” rather than a black‑box.

Practical Implications

Design Blueprint – AI vendors building clinical decision‑support tools can adopt atomic fact‑checking as a default UI pattern: present each recommendation as a list of verifiable statements with direct guideline links.
Regulatory Alignment – The approach satisfies emerging “explainability‑by‑design” requirements from bodies like the FDA and EMA, potentially smoothing the path to market approval.
Reduced Adoption Friction – Higher trust translates to greater willingness to incorporate AI suggestions into multidisciplinary tumor boards, speeding up workflow and standardizing care.
Extensible Beyond Oncology – The same atomic decomposition can be applied to other high‑stakes domains (e.g., cardiology, radiology) where guideline‑based practice dominates.
Training & Education – Medical schools can use atomic fact‑checking interfaces to teach evidence‑based reasoning, turning AI into an interactive learning aid.

Limitations & Future Work

Scope of Guidelines – The trial used only a subset of widely adopted oncology guidelines; performance with less‑structured evidence (e.g., real‑world data) remains unknown.
Trust vs. Outcome – The study measured trust, not actual patient outcomes; future work should link increased trust to clinical efficacy and safety metrics.
Scalability of Fact‑Checking – Generating and maintaining granular claim‑source mappings at scale may require automated citation extraction pipelines, which were not evaluated here.
User Interface Variability – The experiment used a web‑based prototype; integration into EMR systems could introduce new usability challenges.

Bottom line: By turning every AI recommendation into a set of “look‑it‑up” facts, atomic fact‑checking dramatically improves clinicians’ confidence in LLM‑driven oncology support—paving the way for more trustworthy, regulator‑ready AI assistants in healthcare.*

Authors

Lisa C. Adams
Linus Marx
Erik Thiele Orberg
Keno Bressem
Sebastian Ziegelmayer
Denise Bernhardt
Markus Graf
Marcus R. Makowski
Stephanie E. Combs
Florian Matthes
Jan C. Peeken

Paper Information

arXiv ID: 2605.03916v1
Categories: cs.CL, cs.AI
Published: May 5, 2026
PDF: Download PDF

[Paper] Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

[Paper] CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

[Paper] Fast Byte Latent Transformer

[Paper] Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims