English(EN) Looking for backdoors in Jane Street LLMs

Jane Street LLM 后门挑战揭示模型漏洞

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-23 02:17

Jane Street 举办的一项旨在寻找大型语言模型中隐藏后门的挑战，揭示了模型漏洞的见解。在最初的激活和提示方法尝试失败后，作者使用白盒方法成功识别了一些后门。该挑战涉及四个模型，包括一个微调的 Qwen2.5-7B-Instruct 和三个大型 DeepSeek-V3 Mixture-of-Experts 模型，通过 API 访问大型模型。 AI

影响强调了 LLM 中潜在的安全风险以及检测和缓解此类漏洞的持续研究。

排序理由该项目详细介绍了一项专注于识别 LLM 中漏洞（后门）的挑战，属于 AI 安全研究范畴。[lever_c_demoted from research: ic=1 ai=1.0]

在 Alignment Forum 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

Alignment Forum TIER_1 English(EN) · Cipolla · 2026-05-23 02:17

在 Jane Street LLMs 中寻找后门

<p><i><span>I am going to talk about my experience in the Jane Street LLM backdoor challenge. I am sharing partial results. I managed to crack some of the models using white-box methods, after the activation/prompting approach didn't pan out. Happy to discuss better or more promi…

报道来源 [1]

在 Jane Street LLMs 中寻找后门

相关实体

相关话题