[Paper] Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial

Published: (May 5, 2026 at 12:12 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.03916v1

Overview

A recent randomized controlled trial examined whether “atomic fact‑checking” – breaking down an AI‑generated oncology treatment recommendation into discrete, source‑verified claims – can boost clinicians’ trust in large language model (LLM) decision‑support tools. The study found a dramatic increase in trust, suggesting a practical path to safer, more adoptable AI assistants in cancer care.

Key Contributions

  • Introduced atomic fact‑checking as a granular explainability technique that links each claim in an LLM recommendation to the exact guideline or evidence source.
  • Ran a large‑scale RCT with 356 oncology clinicians who evaluated 7,476 AI‑generated recommendations across multiple trust‑rating scenarios.
  • Quantified trust gains: atomic fact‑checking yielded a Cohen’s d of 0.94 (large effect), raising clinician trust from 27 % to 66 %.
  • Compared against traditional transparency (e.g., citation lists, confidence scores), which showed modest improvements (d = 0.25–0.50).
  • Provided a reproducible evaluation framework for future AI‑clinician interaction studies.

Methodology

  1. Participants & Setting – 356 board‑certified oncologists and oncology fellows from several hospitals were recruited online.
  2. AI System – A state‑of‑the‑art LLM was fine‑tuned on oncology guidelines (e.g., NCCN, ESMO) to generate treatment recommendations for a set of realistic patient cases.
  3. Explainability Conditions – Each recommendation was presented under one of four conditions:
    • Baseline – plain text recommendation, no explanation.
    • Citation List – a bibliography of guideline documents.
    • Confidence Scores – per‑recommendation probability estimates.
    • Atomic Fact‑Checking – the recommendation split into atomic statements, each accompanied by a direct link to the exact guideline paragraph that supports it.
  4. Randomization – Cases and explainability conditions were randomly assigned per clinician to avoid learning effects.
  5. Trust Measurement – After each recommendation, clinicians rated their trust on a 5‑point Likert scale and indicated a binary “trust / don’t trust” decision.
  6. Statistical Analysis – Effect sizes (Cohen’s d) and proportion differences were computed; mixed‑effects models accounted for repeated measures per participant.

Results & Findings

  • Trust Boost – Atomic fact‑checking raised the “trust” proportion from 26.9 % (baseline) to 66.5 %, a Δ = 39.6 % absolute increase.
  • Effect Size – Cohen’s d = 0.94 (large) for atomic fact‑checking, versus 0.25–0.50 for traditional transparency methods.
  • Consistency Across Sub‑groups – The trust uplift held for both senior oncologists and trainees, and across tumor types (solid vs. hematologic).
  • Decision Quality – While the study focused on trust, a secondary analysis showed no degradation in recommendation accuracy when atomic fact‑checking was used.
  • User Feedback – Clinicians reported that being able to “click through” each claim to the original guideline made the AI feel more like a “trusted colleague” rather than a black‑box.

Practical Implications

  • Design Blueprint – AI vendors building clinical decision‑support tools can adopt atomic fact‑checking as a default UI pattern: present each recommendation as a list of verifiable statements with direct guideline links.
  • Regulatory Alignment – The approach satisfies emerging “explainability‑by‑design” requirements from bodies like the FDA and EMA, potentially smoothing the path to market approval.
  • Reduced Adoption Friction – Higher trust translates to greater willingness to incorporate AI suggestions into multidisciplinary tumor boards, speeding up workflow and standardizing care.
  • Extensible Beyond Oncology – The same atomic decomposition can be applied to other high‑stakes domains (e.g., cardiology, radiology) where guideline‑based practice dominates.
  • Training & Education – Medical schools can use atomic fact‑checking interfaces to teach evidence‑based reasoning, turning AI into an interactive learning aid.

Limitations & Future Work

  • Scope of Guidelines – The trial used only a subset of widely adopted oncology guidelines; performance with less‑structured evidence (e.g., real‑world data) remains unknown.
  • Trust vs. Outcome – The study measured trust, not actual patient outcomes; future work should link increased trust to clinical efficacy and safety metrics.
  • Scalability of Fact‑Checking – Generating and maintaining granular claim‑source mappings at scale may require automated citation extraction pipelines, which were not evaluated here.
  • User Interface Variability – The experiment used a web‑based prototype; integration into EMR systems could introduce new usability challenges.

Bottom line: By turning every AI recommendation into a set of “look‑it‑up” facts, atomic fact‑checking dramatically improves clinicians’ confidence in LLM‑driven oncology support—paving the way for more trustworthy, regulator‑ready AI assistants in healthcare.*

Authors

  • Lisa C. Adams
  • Linus Marx
  • Erik Thiele Orberg
  • Keno Bressem
  • Sebastian Ziegelmayer
  • Denise Bernhardt
  • Markus Graf
  • Marcus R. Makowski
  • Stephanie E. Combs
  • Florian Matthes
  • Jan C. Peeken

Paper Information

  • arXiv ID: 2605.03916v1
  • Categories: cs.CL, cs.AI
  • Published: May 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Fast Byte Latent Transformer

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slo...