PulseAugur
EN
LIVE 12:27:43

LLM attack benchmarks cover less than 25% of threat landscape

Researchers have developed a new framework to audit the coverage of benchmarks designed to test Large Language Model (LLM) attacks. This framework, based on a taxonomy of over 500 inference-time attacks, reveals that current leading benchmarks cover less than 25% of the potential threat landscape. Notably, categories like Service Disruption and Model Internals lack standardized evaluation, despite documented successful attacks in these areas. AI

IMPACT Highlights significant gaps in LLM security evaluations, potentially guiding future benchmark development and red-teaming efforts.

RANK_REASON Academic paper introducing a new taxonomy and audit framework for LLM security benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets ·

    Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

    arXiv:2605.15118v2 Announce Type: replace-cross Abstract: We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- …