Researchers have introduced AssayBench, a new benchmark designed to evaluate the capabilities of large language models (LLMs) and agents in predicting cellular phenotypes. This benchmark is built upon 1,920 CRISPR screens and focuses on predicting the effects of cellular perturbations, a task crucial for drug discovery. Evaluations show that current LLMs, especially generalist models, significantly outperform biology-specific models and trainable baselines, with further improvements possible through optimization techniques. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a standardized method for assessing AI's potential in biological discovery and drug development.
RANK_REASON The cluster contains a new academic paper introducing a benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]