[Paper] SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment

Published: 5 days ago (May 5, 2026 at 01:36 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.04012v1

Overview

A new study introduces SymptomAI, a suite of conversational agents embedded in the Fitbit app that interview users about their everyday health symptoms and generate differential diagnoses. By testing the agents with almost 14 K real‑world participants, the researchers show that a structured, symptom‑focused interview dramatically improves diagnostic accuracy over the more casual, user‑driven chats that most consumer LLMs employ today.

Key Contributions

Large‑scale real‑world deployment: 13,917 participants interacted with five distinct AI agents via a popular wearable platform.
Rigorous clinical evaluation: 1,228 users supplied clinician‑verified diagnoses; 517 of these cases were double‑checked by an independent panel of physicians (250+ hours of annotation).
Demonstrated diagnostic superiority: SymptomAI’s differential diagnoses were 2.47× more likely to match the clinician’s label than diagnoses produced by clinicians who only saw the raw dialogue (p < 0.001).
Agentic interview design matters: Agents that first conduct a systematic symptom interview before offering a diagnosis outperform “user‑guided” agents that let the conversation flow freely (p < 0.001).
Physiological validation: Using the AI‑generated labels, the team linked >500 K days of wearable data to ~400 conditions, uncovering strong physiological signatures (e.g., OR > 7 for influenza).
General‑population robustness: An auxiliary analysis on 1,509 conversations from a broader US panel confirmed that findings extend beyond Fitbit users.

Methodology

Agent variants – Five conversational bots were built on top of large language models (LLMs). Two were “agentic”: they followed a scripted, evidence‑based symptom interview (ask about onset, severity, associated features, etc.) before proposing a diagnosis. The other three were “user‑guided”: they responded directly to whatever the user typed, mimicking typical consumer chatbots.
Deployment – The agents were integrated into the Fitbit mobile app. Participants were randomly assigned to one of the five bots and asked to describe any health concerns they were experiencing.
Ground‑truth collection – After the AI interview, users could optionally upload a clinician‑provided diagnosis (e.g., from a recent doctor visit). This yielded 1,228 self‑reported clinical labels.
Clinical adjudication – A separate panel of physicians reviewed the full AI‑user dialogue (blinded to the AI’s output) and supplied their own differential diagnosis for 517 of the cases.
Statistical analysis – Diagnostic agreement was measured using odds ratios and significance testing. Wearable sensor streams (heart rate, temperature, activity) were aligned with the AI‑derived condition labels to explore physiological correlates.

Results & Findings

Diagnostic accuracy: SymptomAI’s agentic bots matched the clinician’s diagnosis in 42 % of adjudicated cases versus 23 % for the clinician‑only baseline (OR = 2.47, p < 0.001).
Interview style effect: Structured symptom interviews boosted accuracy by ~15 percentage points over user‑guided chats (p < 0.001).
Physiological signatures: Acute infections (influenza, COVID‑19) showed the strongest wearable changes—elevated resting heart rate and reduced activity—yielding odds ratios > 7 when compared to healthy periods.
Generalizability: The same performance gap between agentic and user‑guided bots appeared in the external US panel, indicating the effect is not limited to Fitbit’s user base.

Practical Implications

Better consumer health assistants: Embedding a brief, evidence‑based symptom interview into any LLM‑powered health chatbot can raise diagnostic relevance, making the tool more trustworthy for users seeking triage advice.
Integration with wearables: Linking AI‑generated condition labels to continuous sensor data enables early detection of disease patterns (e.g., spotting an influenza outbreak from aggregated heart‑rate spikes).
Clinical decision support: Front‑line clinicians could receive a pre‑populated symptom checklist from the AI, reducing interview time and standardizing data capture.
Regulatory pathways: Demonstrating a measurable improvement over clinician‑only interpretation may help satisfy FDA or other health‑technology regulators when positioning such agents as “clinical decision‑support” rather than pure consumer chatbots.
Product roadmap for health apps: Companies can differentiate their offerings by moving from open‑ended chat to a guided interview flow, potentially unlocking new revenue streams (e.g., premium symptom‑tracking subscriptions).

Limitations & Future Work

Self‑reported ground truth: The “clinician diagnosis” used for labeling relies on users uploading their own records, which may be incomplete or inaccurate.
Population bias: Although an external panel was added, the primary cohort consists of Fitbit users who may be more health‑conscious and tech‑savvy than the general public.
Scope of conditions: The study focused on common acute illnesses; performance on chronic, multi‑system diseases remains untested.
Explainability: The agents provide a diagnosis but limited rationale; future work should surface reasoning to improve user trust and clinician acceptance.
Regulatory compliance: Further validation under controlled clinical trials will be needed before deployment as a medical device or diagnostic aid.

SymptomAI shows that a modest change in conversation design—asking the right questions first—can turn a generic LLM into a genuinely useful health assistant. As developers integrate AI into health‑tech products, the lesson is clear: structure matters, and pairing conversational AI with wearable data opens a powerful new frontier for early disease detection.

Authors

Joseph Breda
Fadi Yousif
Beszel Hawkins
Marinela Cotoi
Miao Liu
Ray Luo
Po-Hsuan Cameron Chen
Mike Schaekermann
Samuel Schmidgall
Xin Liu
Girish Narayanswamy
Samuel Solomon
Maxwell A. Xu
Xiaoran Fan
Longfei Shangguan
Anran Wang
Bhavna Daryani
Buddy Herkenham
Cara Tan
Mark Malhotra
Shwetak Patel
John B. Hernandez
Quang Duong
Yun Liu
Zach Wasson
Dimitrios Antos
Bob Lou
Matthew Thompson
Jonathan Richina
Anupam Pathak
Nichole Young-Lin
Jake Sunshine
Daniel McDuff

Paper Information

arXiv ID: 2605.04012v1
Categories: cs.AI
Published: May 5, 2026
PDF: Download PDF

[Paper] SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction