SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
Researchers have developed SkillHarm, a new benchmark designed to test the security of AI agents by evaluating skill-based attacks throughout their lifecycle. The benchmark includes automated methods for constructing poisoned skills, demonstrating significant vulnerabilities in current agents with attack success rates reaching up to 86.3%. The findings highlight that many apparent defense successes are due to agents not engaging with the poisoned files, indicating current defenses are insufficient. AI
IMPACT Highlights critical security flaws in AI agents, necessitating improved defenses for reliable agent deployment.