English(EN) Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

Stable-GFN通过稳定、多样化的攻击生成来增强LLM红队测试

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-01 10:42

研究人员推出Stable-GFlowNet (S-GFN)，这是一种旨在增强大型语言模型 (LLM) 红队测试的多样性和鲁棒性的新方法。该方法通过成对比较消除分区函数估计，并引入流畅性稳定器以防止次优输出，从而解决了在使用生成流网络 (GFN) 识别 LLM 漏洞时遇到的训练不稳定和模式崩溃问题。 AI

影响通过实现更有效和多样化的漏洞发现来改进 LLM 安全测试。

排序理由这是一篇描述LLM红队测试新方法的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim · 2026-05-04 04:00

Stable-GFlowNet：通过对比轨迹平衡实现多样化和鲁棒的LLM红队测试

arXiv:2605.00553v1 Announce Type: new Abstract: Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is chal…
arXiv cs.LG TIER_1 English(EN) · Junmo Kim · 2026-05-01 10:42

Stable-GFlowNet：通过对比轨迹平衡实现多样化和鲁棒的LLM红队测试

Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that pe…

报道来源 [2]

Stable-GFlowNet：通过对比轨迹平衡实现多样化和鲁棒的LLM红队测试

Stable-GFlowNet：通过对比轨迹平衡实现多样化和鲁棒的LLM红队测试

相关实体

相关话题