New research explores LLM vulnerability detection, improving accuracy and analyzing prompt sensitivity

By PulseAugur Editorial · [2 sources] · 2026-05-26 04:00

Two new research papers explore the use of large language models (LLMs) for vulnerability detection in software. The first paper introduces VULPO, a novel on-policy optimization framework that uses a new dataset, ContextVul, to improve LLM performance in identifying vulnerabilities by considering contextual information and reasoning traces. VULPO-4B, a specialized LLM, significantly outperforms existing methods. The second paper presents PromptAudit, a framework for evaluating how prompt sensitivity affects LLM-based vulnerability detection, finding that while chain-of-thought prompting is generally effective, prompt variations can significantly alter model performance and reliability. AI

IMPACT These studies highlight advancements in using LLMs for code security, potentially leading to more robust automated vulnerability detection tools.

RANK_REASON Two academic papers published on arXiv detailing new methods and analyses for using LLMs in vulnerability detection.

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New research explores LLM vulnerability detection, improving accuracy and analyzing prompt sensitivity

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Youpeng Li, Fuxun Yu, Weiliang Qi, Xinda Wang · 2026-05-28 04:00

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

arXiv:2511.11896v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have recently shown strong potential in vulnerability detection (VD). However, accurately detecting vulnerabilities in real-world repositories requires reasoning over complex contextual interac…
arXiv cs.AI TIER_1 English(EN) · Steffen J. Camarato, Yahya Hmaiti, Mandana Ghadamian, David Mohaisen · 2026-05-26 04:00

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

arXiv:2605.24171v1 Announce Type: cross Abstract: Large language models are increasingly used for vulnerability detection, yet their reliability under different prompt formulations remains uncharacterized. We present PromptAudit, a controlled evaluation framework that isolates pr…

COVERAGE [2]

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

RELATED ENTITIES

RELATED TOPICS