English(EN) Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

基准测试显示，LLM 在安全审计中的可重复性表现不一

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

一个名为 Snyk VulnBench JS 1.0 的新基准测试已被开发出来，用于评估大型语言模型 (LLM) 安全审查的可重复性。该基准测试发现，虽然 LLM 的发现结果在不同运行之间可能存在显著差异，但与参考匹配的发现结果显示出更大的稳定性。研究表明，将 agentic LLM 安全审查与 Snyk Code 等确定性静态应用程序安全测试 (SAST) 工具相结合，比单独依赖任何一种方法都能提供更稳健的解决方案。 AI

影响强调了在 AI 辅助代码安全中采用混合方法的需求，将 LLM 与传统的 SAST 工具相结合以提高可靠性。

排序理由该集群包含一篇详细介绍新基准及其发现的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Liran Tal, Johannes Kloos, Arsenii Rudich, Stephen Thoemmes, Manoj Nair · 2026-06-16 04:00

Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

arXiv:2606.15762v1 Announce Type: cross Abstract: We ran 300 repeated vulnerability-finding scans to measure how repeatable agentic large language model (LLM) security review is on the same JavaScript code, prompt, and benchmark harness. The headline result is that LLM security f…

报道来源 [1]

Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

相关实体

相关话题