Researchers have developed a new method to test the vulnerability of speech-based large language models (LLMs) to manipulative prompts, termed "gaslighting attacks." These attacks employ five strategies—Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation—to evaluate how LLMs respond to misleading or overriding input. Across five different speech and multi-modal LLMs, these attacks led to an average accuracy decrease of 24.3%, highlighting significant behavioral vulnerabilities in current speech AI systems and underscoring the need for more robust and trustworthy technology. AI
IMPACT Introduces novel attack vectors that could compromise speech AI systems, necessitating new safety and robustness research.
RANK_REASON Academic paper introducing a new attack methodology and benchmark for speech LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →