新的基准 OpenSafeIntent 评估用户意图下的 AI 安全性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 11:14

研究人员推出了 OpenSafeIntent，这是一个新的基准，旨在评估 AI 模型在同一任务的不同用户意图下保持安全性的能力。该基准使用受控的提示集，包括任务的良性、双重用途和恶意变体，以评估模型是否能够适当校准其协助。研究结果表明，当意图发生变化时，模型通常无法保持安全，双重用途行为很脆弱，并且将风险请求重构为更安全任务的响应不太可能违反安全界限。 AI

影响该基准可能会带来更强大的 AI 安全评估，促使模型更好地处理细微的用户意图并减少有害输出。

排序理由该集群包含一篇介绍 AI 安全评估新基准的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Rheeya Uppaal, Seungwoo Lyu, Selina Sung, Junjie Hu · 2026-07-03 04:00

OpenSafeIntent: Evaluating Intent-Calibrated Safe Completion Across Dual-Use Prompt Sets

arXiv:2607.02047v1 Announce Type: cross Abstract: Safe completion requires models to provide useful assistance without enabling harm, but this behavior is difficult to evaluate with isolated prompts. We introduce OpenSafeIntent, a benchmark of controlled prompt-sets that vary int…
arXiv cs.AI TIER_1 English(EN) · Junjie Hu · 2026-07-02 11:14

OpenSafeIntent: Evaluating Intent-Calibrated Safe Completion Across Dual-Use Prompt Sets

Safe completion requires models to provide useful assistance without enabling harm, but this behavior is difficult to evaluate with isolated prompts. We introduce OpenSafeIntent, a benchmark of controlled prompt-sets that vary intent while holding the underlying task fixed. Each …

报道来源 [2]

OpenSafeIntent: Evaluating Intent-Calibrated Safe Completion Across Dual-Use Prompt Sets

OpenSafeIntent: Evaluating Intent-Calibrated Safe Completion Across Dual-Use Prompt Sets

相关实体

相关话题