[Paper] Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset

Published: (February 26, 2026 at 01:40 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.23335v1

Overview

The paper introduces the Asta Interaction Dataset, a massive, anonymized log of how researchers actually use AI‑powered literature discovery and question‑answering tools. By analyzing more than 200 K queries and interaction traces from a real‑world retrieval‑augmented generation (RAG) platform, the authors reveal how scientists treat these systems as collaborative partners rather than simple search engines. The findings give developers concrete clues about designing more useful AI research assistants.

Key Contributions

  • Large‑scale, real‑world dataset: >200 K user queries and interaction logs from two deployed AI research tools, released publicly for the community.
  • Query‑intent taxonomy: A fine‑grained classification (e.g., “drafting”, “gap‑identification”, “citation‑verification”) that captures the diverse purposes of AI‑assisted research.
  • Behavioral insights: Empirical evidence that researchers issue longer, more complex queries, treat generated text as persistent artifacts, and navigate citations in non‑linear ways.
  • Experience curve analysis: Demonstrates how query specificity and citation engagement evolve as users become more familiar with the tool.
  • Design recommendations: Concrete guidelines for building AI research assistants that support drafting, iterative refinement, and citation management.

Methodology

  1. Data collection – The authors instrumented two production tools (a literature discovery UI and a scientific QA interface) built on a LLM‑backed RAG architecture. All user interactions (queries, clicks, scrolls, citation expansions, and session timestamps) were logged over several months.
  2. Anonymization & preprocessing – Personal identifiers and sensitive content were stripped; queries were tokenized and normalized.
  3. Taxonomy development – A mixed‑methods approach combined manual annotation of a random query sample with clustering of semantic embeddings to derive a 12‑category intent schema.
  4. Quantitative analysis – Metrics such as query length, token diversity, session depth, citation click‑through rate, and “artifact revisitation” frequency were computed. Temporal trends were examined by segmenting users into novice, intermediate, and expert cohorts based on session count.
  5. Statistical validation – Differences across cohorts and tool types were tested with ANOVA and post‑hoc Tukey tests, ensuring results are not artifacts of random variation.

Results & Findings

FindingWhat it means
Average query length = 12.4 tokens (vs. ~5 tokens in traditional web search)Researchers ask more detailed, multi‑sentence questions, expecting richer context from the AI.
~38 % of sessions involve “drafting” intents (e.g., asking the model to write an abstract or related work paragraph)The AI is being used as a writing collaborator, not just a retrieval engine.
Citation‑click‑through rate = 62 %, and 27 % of users revisit the same generated answer across multiple sessionsGenerated responses become “sticky” artifacts; users treat them as reference material worth revisiting.
Experienced users (≥10 sessions) issue 22 % more targeted queries (e.g., “compare method X vs Y on dataset Z”)Familiarity leads to more precise prompting, but keyword‑style queries still persist.
Non‑linear navigation – 45 % of sessions involve jumping between answer sections and cited papers, then back to the answerUsers are iteratively refining understanding, using the AI as a hub that links to primary sources.
Persistent “gap‑identification” queries – 15 % of all queries ask the model to highlight missing literature or open problemsAI is being leveraged for research planning and hypothesis generation.

Practical Implications

  • Design for Drafting: UI should expose easy ways to export, edit, and version‑control AI‑generated text (e.g., markdown export, Git integration).
  • Citation Management Integration: Embed citation metadata directly into the answer UI, with one‑click import into reference managers (Zotero, Mendeley).
  • Session Persistence: Treat each answer as a first‑class artifact—allow bookmarking, tagging, and linking between answers to support the observed non‑linear workflow.
  • Prompt Guidance: Offer dynamic prompt templates that evolve with user expertise, nudging novices toward more targeted queries while still supporting exploratory, keyword‑style searches.
  • Evaluation Benchmarks: The released taxonomy and dataset give developers a realistic testbed for measuring “research‑assistant” performance beyond standard QA metrics (e.g., include citation relevance, draft quality, and user engagement).
  • Privacy‑by‑Design: Since the dataset required thorough anonymization, any production system should adopt similar safeguards when logging researcher interactions.

Limitations & Future Work

  • Domain bias: The data comes from a single RAG platform focused on life‑science literature, so patterns may differ in other fields (e.g., CS, humanities).
  • Self‑selection: Users who opted into the tool may be more tech‑savvy, potentially inflating the prevalence of advanced prompting behaviors.
  • Static analysis: The study captures snapshots of interaction; longitudinal studies over years could reveal deeper learning curves.
  • Future directions suggested by the authors include expanding the dataset to multi‑disciplinary corpora, incorporating eye‑tracking or think‑aloud protocols to better understand cognitive load, and testing adaptive UI components that respond to the identified usage phases (exploration → drafting → citation verification).

Authors

  • Dany Haddad
  • Dan Bareket
  • Joseph Chee Chang
  • Jay DeYoung
  • Jena D. Hwang
  • Uri Katz
  • Mark Polak
  • Sangho Suh
  • Harshit Surana
  • Aryeh Tiktinsky
  • Shriya Atmakuri
  • Jonathan Bragg
  • Mike D’Arcy
  • Sergey Feldman
  • Amal Hassan-Ali
  • Rubén Lozano
  • Bodhisattwa Prasad Majumder
  • Charles McGrady
  • Amanpreet Singh
  • Brooke Vlahos
  • Yoav Goldberg
  • Doug Downey

Paper Information

  • arXiv ID: 2602.23335v1
  • Categories: cs.HC, cs.AI, cs.IR
  • Published: February 26, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Model Agreement via Anchoring

Numerous lines of aim to control model disagreement -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and stan...

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...