[Paper] Autonomous Uncertainty Quantification for Computational Point-of-care Sensors

Published: (December 24, 2025 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.21335v1

Overview

A new study demonstrates how to make AI‑driven point‑of‑care (POC) diagnostic sensors more trustworthy by automatically flagging uncertain predictions. Using a paper‑based vertical flow assay for Lyme disease, the authors show that Monte Carlo dropout can identify “risky” outputs and discard them, boosting clinical sensitivity without any human‑in‑the‑loop verification.

Key Contributions

  • Autonomous uncertainty quantification (UQ) for neural‑network‑based POC diagnostics, requiring no external ground‑truth labels at inference time.
  • Integration of Monte Carlo dropout (MCDO) into the inference pipeline of a handheld, paper‑based vertical flow assay (xVFA) for Lyme disease.
  • Demonstrated sensitivity gain from 88.2 % to 95.7 % on a blinded clinical cohort by rejecting high‑uncertainty predictions.
  • End‑to‑end prototype that combines a disposable assay, a low‑cost optical reader, and on‑device AI, all operating within 20 minutes and 20 µL of serum.
  • Generalizable framework that can be transplanted to other rapid diagnostic tests (RDTs) and computational biosensors.

Methodology

  1. Sensor platform (xVFA) – A paper‑based vertical flow assay captures antibodies from a patient’s serum. After a short incubation, an optical reader records a grayscale image of the test zone.
  2. Neural network inference – A small convolutional network processes the image and outputs a probability of Lyme disease positivity.
  3. Monte Carlo dropout for UQ – During inference, dropout layers remain active, and the model is run N times (e.g., 30 stochastic forward passes). This yields a distribution of predictions per sample.
  4. Uncertainty metric – The variance (or entropy) across the N predictions quantifies uncertainty. A pre‑defined threshold separates “confident” from “uncertain” cases.
  5. Autonomous decision rule – If uncertainty exceeds the threshold, the system automatically withholds the diagnosis (e.g., prompts retest or referral). Otherwise, it reports the neural network’s majority vote.
  6. Evaluation – The pipeline was tested on a blinded set of clinical serum samples, comparing sensitivity and specificity before and after applying the UQ filter.

Results & Findings

MetricWithout UQWith MCDO‑UQ (high‑uncertainty filtered)
Sensitivity (true‑positive rate)88.2 %95.7 %
Specificity (true‑negative rate)~92 % (unchanged)~92 %
Fraction of samples rejected~12 % (those flagged as high‑uncertainty)
Overall diagnostic accuracy90 %94 %

Interpretation: By discarding roughly one‑tenth of the predictions that the model itself deemed unreliable, the system avoided false negatives that would have led to missed Lyme disease cases. Importantly, the specificity remained stable because the uncertainty filter primarily caught ambiguous positives rather than negatives.

Practical Implications

  • Safer AI‑enabled diagnostics – Clinics and field health workers can trust AI outputs more, knowing the system self‑polices its own mistakes.
  • Reduced need for expert oversight – In remote or resource‑limited settings, the device can autonomously decide when to request a confirmatory test, lowering operational costs.
  • Scalable to other POC assays – The same MCDO‑UQ wrapper can be added to any neural‑network‑based RDT (e.g., COVID‑19 antigen tests, malaria rapid tests) with minimal code changes.
  • Regulatory friendliness – Demonstrating built‑in uncertainty handling aligns with emerging FDA guidance on AI/ML medical devices that require “continuous monitoring” and “risk mitigation.”
  • Edge‑computing feasibility – Monte Carlo dropout only adds a modest compute overhead (multiple forward passes) that fits on low‑power microcontrollers or smartphones, preserving the handheld form factor.

Limitations & Future Work

  • Threshold selection – The uncertainty cutoff was empirically set; adaptive or patient‑specific thresholds could further improve performance.
  • Sample rejection rate – About 12 % of cases were withheld, which may be high for some workflows; future work should aim to reduce this while keeping sensitivity gains.
  • Single disease focus – Validation was limited to Lyme disease; broader clinical trials across diverse biomarkers are needed to confirm generality.
  • Hardware constraints – Real‑time MCDO inference on ultra‑low‑power devices may still be challenging; exploring lighter Bayesian approximations (e.g., deep ensembles) is a promising direction.

By embedding autonomous uncertainty quantification directly into the AI core of computational POC sensors, this research paves the way for more reliable, deployable diagnostics that can truly serve underserved populations.

Authors

  • Artem Goncharov
  • Rajesh Ghosh
  • Hyou-Arm Joung
  • Dino Di Carlo
  • Aydogan Ozcan

Paper Information

  • arXiv ID: 2512.21335v1
  • Categories: physics.med-ph, cs.LG, physics.app-ph, physics.bio-ph
  • Published: December 24, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »