English(EN) PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

新框架以几何方式模拟大语言模型对话以检测攻击

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-03 04:00

研究人员开发了一个名为 PsychoPass 的新框架，用于检测与大型语言模型的对抗性对话。该方法将对话建模为嵌入空间中的几何路径，分析其轨迹而非仅仅是单个回合。PsychoPass 提取几何特征，以便在对话早期预测潜在攻击，并在不同编码器上表现出鲁棒性，优于基线防护措施。 AI

影响通过分析对话几何图形，为大语言模型安全引入了一种新颖的方法，有可能实现对对抗性攻击更鲁棒的实时检测。

排序理由详细介绍大语言模型安全新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Muberra Ozmen, Subhabrata Majumdar · 2026-06-03 04:00

PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

arXiv:2606.03136v1 Announce Type: cross Abstract: Multi-turn jailbreak attacks on large language models (LLMs) reveal a mismatch in current guardrails: they operate on individual turns, while attacks unfold as trajectories across conversations. We propose a shift from content to …