Researchers have identified a novel method for detecting memory poisoning attacks on AI agents by analyzing their tool-call trajectories. They discovered a behavioral invariant where successful attacks consistently involve calling `memory_recall_fact` before `email_send_email`, a sequence rarely seen in legitimate sessions. This invariant, when used with a Random Forest classifier, achieves a high detection rate (AUC = 0.9904) and generalizes across various models, including GPT-4.1 and GPT-4o, without retraining. The method can also differentiate memory-channel attacks from prompt-injection attacks using tool-call logs alone. AI
IMPACT This research offers a robust method for securing AI agents against memory poisoning, potentially improving the reliability of AI systems in critical applications.
RANK_REASON The cluster contains a research paper detailing a new method for detecting AI agent memory poisoning.
- AI agents
- arXiv
- email_send_email
- GPT-4.1
- GPT-4o
- memory poisoning
- memory_recall_fact
- prompt injection
- Random Forest
- tool-call logs
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →