EpiBench benchmark reveals AI agents struggle with epigenomics analysis

By PulseAugur Editorial · [2 sources] · 2026-06-11 17:20

A new benchmark called EpiBench has been developed to evaluate AI agents on short-horizon epigenomics analysis tasks. The benchmark, which includes 106 evaluations across various genomic assay workflows, found that no AI system passed a majority of attempts. GPT-5.5 / Pi performed best, passing 45.0% of tasks, followed closely by GPT-5.5 / OpenAI Codex and Claude Opus 4.8 Max / Pi. While agents could often identify correct files and compute intermediate results, they struggled with tasks requiring deep, assay-specific scientific judgment. AI

IMPACT Highlights current limitations of AI agents in complex scientific domains, indicating a need for improved reasoning and domain-specific judgment.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI agents on a specific scientific task.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Harihara Muralidharan, Reema Baskar, Soo Hee Lee, Tim Proctor, Kenny Workman · 2026-06-12 04:00

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

arXiv:2606.13602v1 Announce Type: new Abstract: We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable an…
arXiv cs.AI TIER_1 English(EN) · Kenny Workman · 2026-06-11 17:20

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable answers. The benchmark includes 106 evaluations ac…

COVERAGE [2]

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

RELATED ENTITIES

RELATED TOPICS