English(EN) Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

LLM漏洞检测依赖安全模式，而非直接签名

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 04:00

研究人员采用机械可解释性来分析大型语言模型（LLM）如何检测软件漏洞，重点关注Gemma-2-2b模型。他们的研究表明，该模型主要通过特定的注意力头识别安全的编码模式来识别易受攻击的代码，而不是直接检测漏洞签名。这种电路级分析确定了关键的神经组件，包括早期层的注意力头和第7层的MLP神经元，这些组件对模型的安全预测至关重要。消融实验证明了这些组件的因果影响，表明移除它们会显著降低检测准确性，突显了LLM漏洞检测电路的稀疏性和可解释性。 AI

影响揭示了LLM通过识别安全代码模式来检测漏洞，这表明有针对性地改进AI驱动的安全工具具有潜力。

排序理由学术论文，详细介绍了一种分析LLM内部计算以理解其漏洞检测推理的新颖方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann · 2026-05-29 04:00

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

arXiv:2605.29901v1 Announce Type: cross Abstract: Large language models (LLMs) can detect software vulnerabilities, but how do they actually identify vulnerable code? We address this question using mechanistic interpretability; analyzing the internal computations of a neural netw…

报道来源 [1]

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

相关实体

相关话题