New gaslighting attacks reveal 24% accuracy drop in speech LLMs

By PulseAugur Editorial · [1 source] · 2026-05-25 04:00

Researchers have developed a new method to test the vulnerability of speech-based large language models (LLMs) to manipulative prompts, termed "gaslighting attacks." These attacks employ five strategies—Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation—to evaluate how LLMs respond to misleading or overriding input. Across five different speech and multi-modal LLMs, these attacks led to an average accuracy decrease of 24.3%, highlighting significant behavioral vulnerabilities in current speech AI systems and underscoring the need for more robust and trustworthy technology. AI

IMPACT Introduces novel attack vectors that could compromise speech AI systems, necessitating new safety and robustness research.

RANK_REASON Academic paper introducing a new attack methodology and benchmark for speech LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 · Jinyang Wu, Bin Zhu, Xiandong Zou, Qiquan Zhang, Xu Fang, Pan Zhou · 2026-05-25 04:00

Benchmarking Gaslighting Attacks Against Speech Large Language Models

arXiv:2509.19858v2 Announce Type: replace Abstract: As Speech Large Language Models (Speech LLMs) become increasingly integrated into voice-based applications, ensuring their robustness against manipulative or adversarial input becomes critical. Although prior work has studied ad…

COVERAGE [1]

Benchmarking Gaslighting Attacks Against Speech Large Language Models

RELATED TOPICS