English(EN) SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

新框架SciCustom为科学任务定制化LLM评估

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-19 04:41

研究人员开发了SciCustom，一个旨在为评估大型语言模型科学能力创建定制化基准的新框架。该系统通过从大规模科学数据构建特定应用的评估来解决现有基准的局限性。SciCustom将科学知识组织成单元，映射数据实例，并检索相关单元以生成基准，从而在无需专家注释或合成数据的情况下揭示细粒度的LLM差异。 AI

影响能够更精确地评估LLM在科学领域的表现，可能指导未来研究应用的LLM开发。

排序理由该集群包含一篇详细介绍LLM能力评估新框架的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Ming Zhang · 2026-05-19 04:41

SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

Large language models (LLMs) are increasingly applied to scientific research, yet existing evaluations often fail to reflect the fine-grained capabilities required in practice. Most benchmarks are manually curated or domain-generic, limiting scalability and alignment with real sc…

报道来源 [1]

SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

相关实体

相关话题