Brief · PulseAugur

TOOL · arXiv cs.CL · 16h

Benchmarking Gaslighting Attacks Against Speech Large Language Models

Researchers have developed a new method to test the vulnerability of speech-based large language models (LLMs) to manipulative prompts, termed "gaslighting attacks." These attacks employ five strategies—Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation—to evaluate how LLMs respond to misleading or overriding input. Across five different speech and multi-modal LLMs, these attacks led to an average accuracy decrease of 24.3%, highlighting significant behavioral vulnerabilities in current speech AI systems and underscoring the need for more robust and trustworthy technology. AI

IMPACT Introduces novel attack vectors that could compromise speech AI systems, necessitating new safety and robustness research.

Jack Wu
Speech Large Language Models
Gaslighting Attacks