Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols
Researchers have developed a novel framework for detecting attacks within the tool-call traffic of Large Language Model (LLM) agents. This system represents agent sessions as graphs, incorporating sentence-embedding features from tool arguments and responses to classify traffic as benign or malicious. The study found that content-level features are crucial for effective detection, significantly outperforming metadata-only approaches, and highlighted a common evaluation pitfall that can inflate performance metrics. AI
IMPACT This research introduces a more robust method for securing LLM agents by detecting malicious tool-use, which could improve the safety and reliability of AI systems interacting with external services.