Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks
Researchers have developed a new framework to audit the coverage of benchmarks designed to test Large Language Model (LLM) attacks. This framework, based on a taxonomy of over 500 inference-time attacks, reveals that current leading benchmarks cover less than 25% of the potential threat landscape. Notably, categories like Service Disruption and Model Internals lack standardized evaluation, despite documented successful attacks in these areas. AI
IMPACT Highlights significant gaps in LLM security evaluations, potentially guiding future benchmark development and red-teaming efforts.