新的煤气灯攻击揭示语音大语言模型准确率下降24%

作者 PulseAugur 编辑部 · [1 source] · 2026-05-25 04:00

研究人员开发了一种新的方法来测试基于语音的大语言模型（LLMs）对操纵性提示的脆弱性，称为“煤气灯攻击”。这些攻击采用五种策略——愤怒、认知颠覆、讽刺、隐含和专业否定——来评估LLMs如何响应误导性或压倒性的输入。在五个不同的语音和多模态LLMs上，这些攻击导致平均准确率下降24.3%，凸显了当前语音AI系统显著的行为脆弱性，并强调了对更强大、更值得信赖的技术的需求。 AI

影响引入了可能危及语音AI系统的新型攻击向量，需要新的安全和鲁棒性研究。

排序理由学术论文，介绍了一种针对语音LLMs的新攻击方法和基准。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 · Jinyang Wu, Bin Zhu, Xiandong Zou, Qiquan Zhang, Xu Fang, Pan Zhou · 2026-05-25 04:00

Benchmarking Gaslighting Attacks Against Speech Large Language Models

arXiv:2509.19858v2 Announce Type: replace Abstract: As Speech Large Language Models (Speech LLMs) become increasingly integrated into voice-based applications, ensuring their robustness against manipulative or adversarial input becomes critical. Although prior work has studied ad…

报道来源 [1]

Benchmarking Gaslighting Attacks Against Speech Large Language Models

相关话题