SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding
Researchers have introduced SciHorizon-GENE, a new benchmark designed to evaluate the capabilities of large language models (LLMs) in understanding and reasoning about gene-level biological information. This benchmark, derived from extensive biological databases, includes over 540,000 questions covering gene-to-function reasoning relevant to cell annotation and mechanism analysis. Evaluations of current LLMs reveal significant variations in their gene-level reasoning abilities and persistent issues with generating accurate and complete functional interpretations. AI
IMPACT Establishes a new standard for evaluating LLM performance in life sciences, guiding development for biological interpretation tasks.