[Paper] Hunt Globally: Deep Research AI Agents for Drug Asset Scouting in Investing, Business Development, and Search & Evaluation

Published: 3 days ago (February 16, 2026 at 01:57 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.15019v1

Overview

The paper introduces Bioptic, a tree‑structured, self‑learning AI agent designed to hunt for drug‑development assets hidden in the massive, multilingual, and often non‑English scientific and patent literature. By benchmarking against leading LLM‑based research tools, the authors show that Bioptic can dramatically improve recall while avoiding hallucinations—an essential capability for investors, business‑development teams, and venture capitalists who need to spot “under‑the‑radar” biotech opportunities before competitors do.

Key Contributions

A new benchmarking framework for drug‑asset scouting that simulates real‑world investor queries, mixes languages, and uses LLM‑as‑judge scoring calibrated to expert opinions.
Bioptic Agent architecture: a tree‑based, self‑learning “bioptic” (dual‑view) system that combines coarse‑grained retrieval with fine‑grained verification to achieve high recall without hallucination.
Comprehensive empirical evaluation against five state‑of‑the‑art research agents (Claude Opus 4.6, GPT‑5.2 Pro, Gemini 3 Pro + Deep Research, Perplexity Deep Research, Exa Websets).
Evidence that scaling compute (more retrieval passes, larger model inference) yields steep performance gains for this task.
Open‑source‑ready pipeline for generating benchmark queries from real investor screening prompts, enabling reproducibility and future extensions.

Methodology

Query Collection – The team gathered real screening prompts from biotech investors, business‑development (BD) professionals, and venture capitalists. These prompts serve as “priors” that reflect the complex, multi‑criteria nature of asset scouting.
Synthetic Benchmark Generation – Using the priors, a conditional language model creates a large set of realistic, multilingual search queries. Each query is paired with a ground‑truth list of drug assets that are outside the typical U.S.-centric radar (e.g., Chinese patents, non‑English conference papers).
Bioptic Agent Design –
- Coarse Retrieval Layer: a tree‑structured search over heterogeneous data sources (patent databases, regional journals, preprint servers) using multilingual embeddings.
- Fine Verification Layer: a second‑stage LLM that validates each candidate against source documents, filtering out hallucinated or irrelevant results.
- Self‑Learning Loop: feedback from the verification layer updates retrieval weights, allowing the system to improve over time without human re‑labeling.
Evaluation – An LLM‑as‑judge model, calibrated with expert annotations, scores precision, recall, and F1 for each system on the benchmark.

Results & Findings

System	F1 Score
Bioptic Agent	79.7 %
Claude Opus 4.6	56.2 %
Gemini 3 Pro + Deep Research	50.6 %
GPT‑5.2 Pro	46.6 %
Perplexity Deep Research	44.2 %
Exa Websets	26.9 %

Recall boost: Bioptic recovers nearly 80 % of the hidden assets, a >20 % absolute gain over the strongest baseline.
Hallucination control: The verification layer reduces false positives dramatically, keeping precision high even as recall rises.
Compute scaling: Adding more retrieval passes (i.e., deeper tree exploration) yields a near‑linear improvement in F1, confirming that compute‑budget can be traded for coverage.

Practical Implications

Accelerated deal sourcing – Investment teams can automate the first‑pass scan of global patent and literature feeds, surfacing promising candidates weeks earlier than manual scouting.
Risk mitigation – By reliably surfacing non‑English assets, firms reduce the chance of missing a breakthrough that could affect valuation or competitive positioning.
Integration‑friendly – The tree‑based architecture can be wrapped as a micro‑service, plugging into existing CRMs, deal‑flow platforms, or internal knowledge graphs.
Cost‑effective scaling – Since performance scales with compute, organizations can start with modest resources (e.g., a few GPUs) and ramp up as the pipeline proves ROI.
Cross‑domain reuse – The bioptic pattern (coarse retrieval + fine verification) is applicable to other high‑recall, low‑hallucination domains such as regulatory compliance, threat intelligence, or scientific literature reviews.

Limitations & Future Work

Data freshness – The benchmark relies on static snapshots of patent and publication databases; real‑time updates (e.g., newly filed Chinese patents) could affect performance.
Language coverage – While multilingual, the system currently favors languages with abundant pretrained embeddings; low‑resource languages may still be under‑represented.
Compute cost – The steep gains with added compute come with higher operational expense, which may be prohibitive for smaller firms.
Human‑in‑the‑loop validation – The study uses an LLM judge calibrated to experts, but a full user study with domain specialists would better quantify practical usability.
Extending to therapeutic efficacy – Future work could integrate downstream data (e.g., clinical trial outcomes) to not just locate assets but also rank them by translational potential.

Authors

Alisa Vinogradova
Vlad Vinogradov
Luba Greenwood
Ilya Yasny
Dmitry Kobyzev
Shoman Kasbekar
Kong Nguyen
Dmitrii Radkevich
Roman Doronin
Andrey Doronichev

Paper Information

arXiv ID: 2602.15019v1
Categories: cs.AI, cs.IR
Published: February 16, 2026
PDF: Download PDF

[Paper] Hunt Globally: Deep Research AI Agents for Drug Asset Scouting in Investing, Business Development, and Search & Evaluation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Knowledge-Embedded Latent Projection for Robust Representation Learning

[Paper] Policy Compiler for Secure Agentic Systems

[Paper] Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology

[Paper] Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents