Researchers have introduced PermaFrost-Attack, a novel method for embedding hidden vulnerabilities, termed 'logic landmines,' into large language models during their pretraining phase. This attack, known as Stealth Pretraining Seeding (SPS), involves distributing small, seemingly innocuous poisoned data across the web, which can then be absorbed into future training datasets like Common Crawl. These dormant landmines remain undetected by standard evaluations but can be activated by specific triggers to bypass safety mechanisms and induce unsafe behavior. AI
影响 Introduces a new class of latent vulnerabilities in LLMs, potentially impacting future model safety and trustworthiness.
排序理由 Academic paper detailing a novel attack vector on LLM pretraining.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →