English(EN) Stealthy World Model Manipulation via Data Poisoning

新的SWAAP框架实现了对AI世界模型的隐蔽数据投毒

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-18 04:00

研究人员推出了一种新颖的两阶段框架SWAAP，旨在操纵AI代理中学习到的世界模型。该方法通过投毒微调轨迹来利用训练过程，从而破坏代理的规划和适应能力。SWAAP旨在诱导低回报行为同时保持隐蔽性，使其难以被检测到。在连续控制任务上的评估表明，在对干净数据进行最小改动的情况下，性能会显著下降，突显了世界模型适应性管道中存在的实际漏洞。 AI

影响强调了使用世界模型的AI代理可能存在的漏洞，需要新的鲁棒性方法来处理训练数据和学习到的动力学。

排序理由详细介绍机器学习中一种新颖数据投毒方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Yibin Hu, Xiaolin Sun, Zizhan Zheng · 2026-06-18 04:00

Stealthy World Model Manipulation via Data Poisoning

arXiv:2606.18697v1 Announce Type: new Abstract: Model-based learning agents use learned world models to predict future states, plan actions, and adapt to new environments. However, the process of updating world models from collected experience creates a training-time attack surfa…

报道来源 [1]

Stealthy World Model Manipulation via Data Poisoning

相关实体

相关话题