PulseAugur
实时 09:22:14

新的SciRisk-Bench基准评估科学领域AI的安全性

研究人员推出了SciRisk-Bench,这是一个旨在评估科学应用(AI4Science)中AI模型安全性的新基准。该基准评估模型识别和规避跨不同科学学科和特定风险维度的能力。SciRisk-Bench涵盖7个学科、31个子学科和10个不同的风险维度,比以往的数据集提供了对科学领域AI安全性的更详细分析。 AI

影响 增强了部署在科学研究中的AI模型的安全评估,可能带来更可靠和更安全的AI4Science应用。

排序理由 该集群描述了一个用于AI安全研究的新学术基准。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新的SciRisk-Bench基准评估科学领域AI的安全性

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Linghao Feng, Yinqian Sun, Dongqi Liang, Sicheng Shen, Chenfei Yan, Yuxuan Peng, Yilin Zhao, Haibo Tong, Kai Li, FeiFei Zhao, Yi Zeng ·

    SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

    arXiv:2606.18936v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an ur…

  2. arXiv cs.AI TIER_1 English(EN) · Yi Zeng ·

    SciRisk-Bench:AI4Science 安全的风险维度感知基准

    Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate no…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

    Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate no…