English(EN) Introducing: DNR-Bench: Do-not-respond Benchmark

新的DNR-Bench显示顶级LLM通过率为0%

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 13:51

一项名为DNR-Bench的新基准测试已被推出，用于评估大型语言模型避免响应特定提示的能力。在包括GPT-5.1、Claude Opus 4.8、Gemini 3 Pro和Grok 4在内的几款领先模型中，该基准测试报告的通过率为0.0%，表明在面对测试提示时，没有一款被测试的模型成功地避免生成任何输出。该基准测试的方法和代码可在GitHub上获取。 AI

影响该基准测试突显了当前LLM中一个关键的安全故障，表明需要改进对齐和拒绝能力。

排序理由该集群描述了一个用于评估LLM安全性的新基准测试，属于研究范畴。[lever_c_从研究降级：ic=1 ai=1.0]

在 r/ClaudeAI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/ClaudeAI TIER_2 English(EN) · /u/No-Cup-7681 · 2026-06-12 13:51

Introducing: DNR-Bench: Do-not-respond Benchmark

<table> <tr><td> <a href="https://www.reddit.com/r/ClaudeAI/comments/1u3vveu/introducing_dnrbench_donotrespond_benchmark/"> <img alt="Introducing: DNR-Bench: Do-not-respond Benchmark" src="https://preview.redd.it/b1lmig0kvu6h1.png?width=640&crop=smart&auto=webp&s=df39…

报道来源 [1]

Introducing: DNR-Bench: Do-not-respond Benchmark

相关实体

相关话题