PulseAugur
实时 20:18:06
English(EN) PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

研究人员揭示PermaFrost-Attack,用于在预训练期间进行潜在的LLM投毒

研究人员推出PermaFrost-Attack,一种在大型语言模型(LLM)预训练阶段嵌入隐藏漏洞(称为“逻辑地雷”)的新方法。这种被称为隐蔽预训练播种(SPS)的攻击,涉及在网络上分发少量看似无害的被污染数据,这些数据随后可能被吸收到未来的训练数据集中,如Common Crawl。这些休眠的地雷不会被标准评估检测到,但可以通过特定触发器激活,绕过安全机制并诱导不安全行为。 AI

影响 引入了一类新的LLM潜在漏洞,可能影响未来模型的安全性和可信度。

排序理由 学术论文,详细介绍了针对LLM预训练的新型攻击向量。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

研究人员揭示PermaFrost-Attack,用于在预训练期间进行潜在的LLM投毒

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das ·

    PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

    arXiv:2604.22117v1 Announce Type: cross Abstract: Aligned large language models(LLMs) remain vulnerable to adversarial manipulation, and their dependence on web-scale pretraining creates a subtle but serious attack surface. We study Stealth Pretraining Seeding (SPS), a new attack…

  2. arXiv cs.CL TIER_1 English(EN) · Amitava Das ·

    PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

    Aligned large language models(LLMs) remain vulnerable to adversarial manipulation, and their dependence on web-scale pretraining creates a subtle but serious attack surface. We study Stealth Pretraining Seeding (SPS), a new attack family in which adversaries distribute small amou…