Frontier AI Safety Regulations: A Reference Guide for AI Company Employees
Researchers are developing new methods to attack and defend AI agents used in software reverse engineering and cybersecurity. One approach uses genetic algorithms to inject malicious prompts into AI agents, causing them to misinterpret code and bypass detection systems. Other studies focus on detecting and obfuscating these prompt injection attacks, as well as defending against multi-step trojan attacks that embed persistent control within agent workflows. Additionally, a framework called CVE-Factory automates the creation of executable vulnerability tasks for training and evaluating code security agents, showing significant improvements in models like Qwen3-32B. AI
IMPACT New attack vectors and defense mechanisms for AI agents highlight critical security vulnerabilities in AI-powered tools.
- GPT-4o
- OWASP
- UK AI Safety Institute
- AgentThreatBench
- Claude Haiku 4.5
- Gemini 3.1 Flash Lite
- MIT AI Risk Repository
- AI agents
- frontier models
- EU AI Act
- LLMs
- GT-HarmBench
- Qwen3-32B
- Claude 4.5
- GPT-5.2-1211-Global
- genetic algorithm
- GPT-5.4
- qwen2.5-coder:14b
- Claude Sonnet 4.6
- prompt injection
- Claude 4.5 Sonnet
- software reverse engineering
- cybersecurity
- CVE-Factory
- Haiku 4.5
- GPT-4o-mini