[Paper] Picking the Right Specialist: Attentive Neural Process-based Selection of Task-Specialized Models as Tools for Agentic Healthcare Systems

Published: 3 days ago (February 16, 2026 at 11:36 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.14901v1

Overview

The paper introduces ToolSelect, a learning‑based system that lets an AI “agent” pick the most suitable specialist model (or “tool”) for a given clinical query. By treating model selection as a learned task and leveraging an Attentive Neural Process, the authors show how to automatically route each request to the specialist that will perform best—crucial for complex, multi‑task healthcare AI that must juggle diagnosis, image localization, report generation, and visual‑question‑answering.

Key Contributions

ToolSelect framework: a novel selector that conditions on both the input query and concise behavioral summaries of each candidate model, using an Attentive Neural Process to predict the optimal tool.
Consistent surrogate loss: formulation of a population‑risk minimization objective that approximates the true task‑conditional selection loss, enabling stable training.
First agentic chest‑X‑ray testbed: a comprehensive environment containing 55 heterogeneous specialist models (disease detection, report generation, visual grounding, VQA).
ToolSelectBench: a benchmark of 1,448 realistic clinical queries spanning four task families, with ground‑truth “best‑tool” labels.
Empirical superiority: ToolSelect outperforms ten state‑of‑the‑art baselines (including ensemble methods, meta‑learners, and reinforcement‑learning selectors) across all tasks.

Methodology

Tool pool & summaries: Each specialist model is pre‑trained on a specific task (e.g., detecting pneumonia, generating radiology reports). For every model, a lightweight “behavioral summary” is computed—statistics such as confidence distributions, past performance on similar inputs, and feature embeddings.
Attentive Neural Process (ANP) selector:
- Context: The query (e.g., a chest X‑ray image plus a textual prompt) is encoded with a CNN‑+‑Transformer backbone.
- Target: The set of model summaries acts as target points.
- Attention: The ANP attends to the most relevant summaries given the query, producing a distribution over tools.
Training objective: The selector is trained to minimize a surrogate loss that approximates the expected task loss if the chosen tool were used. This surrogate is consistent—optimizing it provably drives the selector toward the true optimal tool selection policy.
Evaluation pipeline: On the new Chest X‑ray environment, each query is passed through ToolSelect, which selects a tool; the chosen tool’s output is then scored against the ground‑truth answer.

Results & Findings

Task Family	Baseline Avg. Accuracy	ToolSelect Accuracy
Disease Detection (17 models)	71.2 %	78.9 %
Report Generation (19 models)	62.5 %	70.3 %
Visual Grounding (6 models)	68.0 %	75.4 %
VQA (13 models)	64.1 %	71.8 %

ToolSelect consistently beats the strongest baseline by 6–9 percentage points across all families.
Ablation studies show that removing the attention mechanism or the behavioral summaries drops performance by ~4 pp, confirming their importance.
The selector remains lightweight (≈ 2 M parameters) and adds < 15 ms latency per query, making it viable for real‑time clinical pipelines.

Practical Implications

Dynamic tool orchestration: Healthcare AI platforms can now automatically delegate each request to the model that is empirically best for that specific case, improving diagnostic accuracy without manual model management.
Scalable multi‑task systems: As new specialist models (e.g., for emerging diseases) are added, ToolSelect can incorporate them simply by generating their summaries—no retraining of the entire system is required.
Reduced inference cost: By selecting a single optimal tool rather than running an ensemble of all models, computational load and cloud costs drop dramatically.
Regulatory compliance: Transparent selection logic (the attention weights over model summaries) can be logged for audit trails, helping meet medical AI governance standards.
Developer workflow: Engineers can plug any PyTorch/TensorFlow model into the pool, expose its summary API, and immediately benefit from the selector, accelerating prototyping of agentic health assistants.

Limitations & Future Work

Dependence on summary quality: The selector’s performance hinges on informative behavioral summaries; poorly calibrated summaries can mislead the attention mechanism.
Static pool assumption: The current setup assumes a fixed set of specialist models during training; handling truly online addition/removal of tools remains an open challenge.
Domain specificity: Benchmarks are limited to chest X‑ray tasks; extending to other imaging modalities (CT, MRI) or non‑visual data (EHR notes) will test generality.
Explainability: While attention weights provide some insight, deeper interpretability of why a particular tool was chosen is still needed for high‑stakes clinical decisions.

Overall, ToolSelect offers a practical, data‑driven solution for orchestrating heterogeneous AI specialists in agentic healthcare systems, paving the way for more reliable and efficient clinical AI assistants.

Authors

Pramit Saha
Joshua Strong
Mohammad Alsharid
Divyanshu Mishra
J. Alison Noble

Paper Information

arXiv ID: 2602.14901v1
Categories: cs.LG, cs.AI, cs.CV, cs.MA
Published: February 16, 2026
PDF: Download PDF

[Paper] Picking the Right Specialist: Attentive Neural Process-based Selection of Task-Specialized Models as Tools for Agentic Healthcare Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Are Object-Centric Representations Better At Compositional Generalization?

[Paper] Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

[Paper] B-DENSE: Branching For Dense Ensemble Network Learning

[Paper] Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation