New SWAAP framework enables stealthy data poisoning of AI world models

By PulseAugur Editorial · [1 sources] · 2026-06-18 04:00

Researchers have introduced SWAAP, a novel two-stage framework designed to manipulate learned world models in AI agents. This method exploits the training process by poisoning fine-tuning trajectories to corrupt the agent's planning and adaptation capabilities. SWAAP aims to induce low-return behaviors while maintaining stealth, making it difficult to detect. Evaluations on continuous-control tasks demonstrate significant performance degradation with minimal alteration to clean data, highlighting a practical vulnerability in world-model adaptation pipelines. AI

IMPACT Highlights a potential vulnerability in AI agents that use world models, necessitating new robustness methods for training data and learned dynamics.

RANK_REASON Academic paper detailing a new method for data poisoning in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Yibin Hu, Xiaolin Sun, Zizhan Zheng · 2026-06-18 04:00

Stealthy World Model Manipulation via Data Poisoning

arXiv:2606.18697v1 Announce Type: new Abstract: Model-based learning agents use learned world models to predict future states, plan actions, and adapt to new environments. However, the process of updating world models from collected experience creates a training-time attack surfa…

COVERAGE [1]

Stealthy World Model Manipulation via Data Poisoning

RELATED ENTITIES

RELATED TOPICS