PulseAugur / Brief
EN
LIVE 09:00:01

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

    A new benchmark called EpiBench has been developed to evaluate AI agents on short-horizon epigenomics analysis tasks. The benchmark, which includes 106 evaluations across various genomic assay workflows, found that no AI system passed a majority of attempts. GPT-5.5 / Pi performed best, passing 45.0% of tasks, followed closely by GPT-5.5 / OpenAI Codex and Claude Opus 4.8 Max / Pi. While agents could often identify correct files and compute intermediate results, they struggled with tasks requiring deep, assay-specific scientific judgment. AI

    IMPACT Highlights current limitations of AI agents in complex scientific domains, indicating a need for improved reasoning and domain-specific judgment.