English(EN) I Made Two AI Models Fight Each Other. They Agreed Way Too Much.

大型语言模型（LLM）表现出相关性故障，削弱了独立验证的有效性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 16:18

一项测试Groq的Llama 3.1 8B和OpenRouter的Gemma 4 31B两个大型语言模型（LLM）作为独立验证器的实验显示，它们在故障模式上存在显著的相关性。在受到“越狱”提示时，两个模型分别表现出50%和36%的脆弱性，并且导致它们失败的提示类型存在明显的重叠。这表明，由于共享的训练数据和对齐技术，使用多个LLM并不能保证安全性和可靠性成比例地提高。 AI

影响大型语言模型（LLM）的相关性故障降低了多模型安全系统的有效性，需要新的方法来衡量和确保模型的独立性。

排序理由该集群描述了一项关于大型语言模型（LLM）行为的实验及其发现，这构成了研究。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · HARD IN SOFT OUT · 2026-06-11 16:18

I Made Two AI Models Fight Each Other. They Agreed Way Too Much.

Or: How I learned that "independent validators" are like siblings – they share the same trauma. You know that feeling when you ask two security guards to watch the door, and they both fall asleep at exactly the same time because they had the same lunch? …

报道来源 [1]

I Made Two AI Models Fight Each Other. They Agreed Way Too Much.

相关实体

相关话题