LLM attack benchmarks cover less than 25% of threat landscape

By PulseAugur Editorial · [1 sources] · 2026-06-04 04:00

Researchers have developed a new framework to audit the coverage of benchmarks designed to test Large Language Model (LLM) attacks. This framework, based on a taxonomy of over 500 inference-time attacks, reveals that current leading benchmarks cover less than 25% of the potential threat landscape. Notably, categories like Service Disruption and Model Internals lack standardized evaluation, despite documented successful attacks in these areas. AI

IMPACT Highlights significant gaps in LLM security evaluations, potentially guiding future benchmark development and red-teaming efforts.

RANK_REASON Academic paper introducing a new taxonomy and audit framework for LLM security benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM attack benchmarks cover less than 25% of threat landscape

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets · 2026-06-04 04:00

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

arXiv:2605.15118v2 Announce Type: replace-cross Abstract: We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- …

COVERAGE [1]

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

RELATED ENTITIES

RELATED TOPICS