[Paper] EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

Published: 3 days ago (June 11, 2026 at 01:20 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.13602v1

Overview

We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable answers. The benchmark includes 106 evaluations across CUT&Tag/CUT&RUN, ATAC-seq, ChIP-seq, and DNA methylation workflows. Across 5,088 valid trajectories from 16 model-harness pairs, no system passed a majority of attempts: GPT-5.5 / Pi led at 45.0% (143/318 attempts; 95% confidence interval (CI), 36.3—53.7), followed by GPT-5.5 / OpenAI Codex at 39.9% (127/318 attempts; 95% CI, 31.6—48.3). Claude Opus 4.8 Max / Pi and GPT-5.4 / Pi each passed 39.0% (124/318 attempts; 95% CI, 30.2—47.8 and 31.0—47.0, respectively). Performance varies across assay types, and many failed runs still contain parts of the correct answer. Agents often found the right files and computed useful intermediate results, but failed when the task required deeper, assay-specific scientific judgment.

Key Contributions

This paper presents research in the following areas:

cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.AI.

Authors

Harihara Muralidharan
Reema Baskar
Soo Hee Lee
Tim Proctor
Kenny Workman

Paper Information

arXiv ID: 2606.13602v1
Categories: cs.AI
Published: June 11, 2026
PDF: Download PDF

[Paper] EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

[Paper] Mana: Dexterous Manipulation of Articulated Tools

[Paper] SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

[Paper] Understanding Truncated Positional Encodings for Graph Neural Networks