New detector flags malicious LLM agent skills with high precision

By PulseAugur Editorial · [1 sources] · 2026-06-22 14:41

Researchers have developed a new two-stage detection system called Locate-and-Judge to identify malicious skills within LLM agent marketplaces. This system first uses attention mechanisms to pinpoint high-risk instruction spans within a skill and then conducts a detailed examination of these selected spans. This approach significantly reduces computational costs compared to direct scanning, allowing for the auditing of entire marketplaces and achieving high precision in flagging malicious skills, many of which were confirmed through manual review. AI

IMPACT This research introduces a scalable method to secure LLM agent ecosystems against supply-chain attacks, potentially increasing trust and adoption of agentic systems.

RANK_REASON The cluster contains an academic paper detailing a new method for detecting malicious code in LLM agents. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New detector flags malicious LLM agent skills with high precision

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Tégawendé F. Bissyandé · 2026-06-22 14:41

Detecting Malicious Agent Skills in the Wild using Attention

LLM agents increasingly load skills, file-based packages of natural-language instructions written by third parties and distributed through marketplaces, that execute with the user's privileges. A single malicious skill can exfiltrate data, hijack the agent, or persist as a supply…

COVERAGE [1]

Detecting Malicious Agent Skills in the Wild using Attention

RELATED ENTITIES

RELATED TOPICS