English(EN) Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

在编译和触发强度中发现新的LLM漏洞

作者 PulseAugur 编辑部 · [5 个来源] · 2026-05-20 02:55

研究人员发现了与部署过程中使用的优化技术相关的大型语言模型（LLM）的新漏洞。一项研究表明，旨在提高效率的编译过程可能被利用来植入隐藏的后门，这些后门在特定的编译条件下触发，绕过标准的安全性检查，并在开源LLM上实现高攻击成功率。另一篇理论论文探讨了，与直觉相反的是，在后门攻击中更强的触发器有时可以在高维环境中帮助防御者，攻击成功率在有限的触发器强度下达到峰值。 AI

影响新研究强调了LLM部署管道中的关键安全漏洞，可能影响AI系统的安全性和可靠性。

排序理由多篇学术论文发表在arXiv上，详细介绍了关于LLM漏洞和后门攻击理论方面的新研究。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.AI TIER_1 English(EN) · Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang, Xiaoyu Zhang, Li Pan · 2026-05-22 04:00

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

arXiv:2605.20641v1 Announce Type: cross Abstract: Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we fi…
arXiv cs.LG TIER_1 English(EN) · Aman Saxena, Jan Schuchardt, Yan Scholten, Stephan G\"unnemann · 2026-05-22 04:00

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

arXiv:2605.21780v1 Announce Type: new Abstract: Randomized smoothing is a powerful tool for certifying robustness to adversarial perturbations, including poisoning attacks via randomized training and evasion attacks via randomized inference. Extending these guarantees to backdoor…
arXiv cs.LG TIER_1 English(EN) · Donald Flynn, Hadas Yaron Goldhirsh, Jonathan P. Keating, Inbar Seroussi · 2026-05-22 04:00

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

arXiv:2605.22481v1 Announce Type: new Abstract: Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to…
arXiv cs.LG TIER_1 English(EN) · Inbar Seroussi · 2026-05-21 13:39

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to κ$), varying the training trigger strength $α$ …
arXiv cs.AI TIER_1 English(EN) · Li Pan · 2026-05-20 02:55

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be mali…

报道来源 [5]

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

相关实体

相关话题