Researchers have developed SkillHarm, a new benchmark for evaluating security vulnerabilities in AI agent skills. The benchmark includes two attack scenarios: Fixed-Payload Poisoning, where a skill directly compromises a task, and Self-Mutating Poisoning, where a skill alters itself over time. SkillHarm contains 879 attack samples across 71 skills, demonstrating that current agents are vulnerable with success rates up to 86.3%. The study also highlights that many apparent defense successes are due to agents not engaging with poisoned files, indicating current defenses are insufficient. AI
IMPACT Highlights critical security flaws in AI agent skills, potentially impacting the safe deployment of agent-based systems.
RANK_REASON This is a research paper introducing a new benchmark and taxonomy for evaluating AI agent skill security.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →