[Paper] From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews

Published: (December 12, 2025 at 10:38 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11661v1

Overview

Large Language Models (LLMs) are now a common “co‑author” in academic writing, but their role in literature reviews—where researchers must locate, synthesize, and cite prior work—has been little studied. This paper presents a cross‑disciplinary user study that uncovers why scholars still spend hours double‑checking AI‑generated summaries, and it proposes a concrete design framework to turn LLMs from a verification headache into a trusted research partner.

Key Contributions

  • Empirical insight: A qualitative user study with 45 researchers from STEM, social sciences, and humanities that maps current LLM‑assisted review workflows and pinpoints three core pain points (trust, verification load, tool fragmentation).
  • Design goals: Six actionable design goals (e.g., “continuous verification,” “transparent provenance”) that directly address the identified gaps.
  • High‑level framework: An architecture that couples a visual citation explorer, step‑wise verification hooks, and a human‑feedback loop to keep the LLM’s output aligned with the researcher’s intent.
  • Prototype concepts: Wireframes and interaction patterns (e.g., generation‑guided explanations, “undo‑able” citation edits) that illustrate how the framework could be realized in existing writing environments.
  • Evaluation roadmap: A set of metrics (trust score, verification time, tool‑switch count) for future quantitative studies of LLM‑assisted review tools.

Methodology

  1. Recruitment & Diversity: 45 participants spanning five academic domains were recruited via university mailing lists and professional networks.
  2. Contextual Interviews: Researchers described their typical literature‑review pipeline, the LLM tools they currently use (ChatGPT, Claude, domain‑specific plugins), and the specific frustrations they encounter.
  3. Task‑Based Observation: Participants performed a realistic review task (identifying related work for a short research proposal) while using their preferred LLM setup. Researchers logged every “verification action” (e.g., fact‑checking a citation, switching tools).
  4. Thematic Analysis: Transcripts were coded for recurring challenges, which collapsed into the three gaps mentioned above.
  5. Design Sprint: The authors held a two‑day co‑design workshop with a subset of participants to brainstorm solutions, resulting in the six design goals and the high‑level framework.

The approach balances qualitative depth (rich user narratives) with a structured design process, making the findings actionable for product teams.

Results & Findings

FindingWhat it means
Trust Gap: 78 % of participants doubted the factual accuracy of LLM‑generated summaries without manual checks.Trust is the biggest barrier; users treat LLM output as a “draft” rather than a source.
Verification Overhead: On average, each participant performed 5 – 7 verification steps per 10 generated sentences.The time saved by LLMs is largely eaten up by fact‑checking, negating efficiency gains.
Tool Fragmentation: 62 % switched between at least three separate apps (LLM chat, reference manager, PDF reader).Lack of integrated workflows forces context‑switching, increasing cognitive load.
Design Goal Validation: Participants rated the proposed “continuous verification” and “transparent provenance” goals as the most critical (4.6/5).The six goals align well with real user priorities.

The authors argue that a system built around these goals could cut verification steps by roughly 30 % (based on a pilot mock‑up) and raise a self‑reported trust score from 2.8 to 4.1 on a 5‑point scale.

Practical Implications

  • For Tool Builders: Embedding verification checkpoints (e.g., “show source PDF snippet”) directly into LLM chat windows can reduce the need for external fact‑checking tools.
  • For IDE/Editor Vendors: Adding a citation graph view that updates in real time as the LLM suggests papers gives developers a visual anchor for provenance.
  • For Researchers: A unified interface that lets you “accept, edit, or reject” AI‑generated citations with a single click can shrink the literature‑review cycle from weeks to days.
  • For Open‑Source Communities: The framework’s modular design (LLM core ↔ verification API ↔ UI layer) invites plug‑and‑play extensions—think community‑curated verification datasets or domain‑specific citation validators.
  • Compliance & Ethics: Transparent provenance satisfies many institutional policies that require authors to disclose AI assistance and verify source authenticity, easing legal and ethical concerns.

Limitations & Future Work

  • Sample Size & Diversity: While the study spans several disciplines, 45 participants may not capture niche workflows (e.g., legal scholarship, large‑scale systematic reviews).
  • Prototype Fidelity: The presented UI concepts were low‑fidelity mock‑ups; real‑world performance (latency, integration with existing reference managers) remains untested.
  • LLM Generality: The findings are based on current GPT‑4‑class models; future multimodal or retrieval‑augmented LLMs could shift the verification landscape.

Future research directions include a large‑scale field trial of a fully integrated prototype, quantitative measurement of productivity gains, and exploration of automated provenance verification (e.g., linking generated claims to DOI‑indexed sources in real time).

Authors

  • Brenda Nogueira
  • Werner Geyer
  • Andrew Anderson
  • Toby Jia‑Jun Li
  • Dongwhi Kim
  • Nuno Moniz
  • Nitesh V. Chawla

Paper Information

  • arXiv ID: 2512.11661v1
  • Categories: cs.HC, cs.AI
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »