English(EN) SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

新基准揭示AI代理技能易受新型攻击

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-01 17:45

研究人员开发了SkillHarm，一个用于评估AI代理技能安全漏洞的新基准。该基准包括两种攻击场景：固定载荷投毒（Fixed-Payload Poisoning），其中技能直接破坏任务；以及自变异投毒（Self-Mutating Poisoning），其中技能会随着时间推移而改变自身。SkillHarm包含71个技能的879个攻击样本，表明当前代理的成功率高达86.3%，存在漏洞。研究还强调，许多看似成功的防御措施是由于代理未与被投毒的文件交互，表明当前的防御措施不足。 AI

影响突出了AI代理技能中关键的安全漏洞，可能影响基于代理的系统的安全部署。

排序理由这是一篇介绍用于评估AI代理技能安全性的新基准和分类法的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou, Junyi Li, Weitong Ruan, Chentao Ye, Rahul Gupta, Diyi Yang, Yu Su, Huan Sun · 2026-06-02 04:00

SkillHarm：通过自动化构建实现生命周期感知的基于技能的攻击

arXiv:2606.02540v1 Announce Type: new Abstract: Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent beh…
arXiv cs.CL TIER_1 English(EN) · Huan Sun · 2026-06-01 17:45

SkillHarm：通过自动化构建实现生命周期感知的基于技能的攻击

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they …

报道来源 [2]

SkillHarm：通过自动化构建实现生命周期感知的基于技能的攻击

SkillHarm：通过自动化构建实现生命周期感知的基于技能的攻击

相关实体

相关话题