Researchers have introduced MolViBench, a novel benchmark designed to evaluate the capabilities of large language models (LLMs) in molecular coding tasks. This benchmark addresses the gap left by existing evaluations, which either lack chemistry knowledge or focus on recall rather than executable code generation. MolViBench includes 358 tasks across five cognitive levels, covering 12 real-world drug discovery workflows, and employs a multi-layered framework to assess code executability and chemical correctness. AI
影响 Establishes a new evaluation standard for LLMs in molecular discovery, potentially guiding future model development for scientific applications.
排序理由 The cluster describes a new academic paper introducing a benchmark for evaluating LLMs in a specific domain.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →