Brief · PulseAugur

RESEARCH · METR (Model Evaluation & Threat Research) 中文(ZH) · 4mo · [104 sources]

Frontier AI Safety Regulations: A Reference Guide for AI Company Employees

Researchers are developing new methods to attack and defend AI agents used in software reverse engineering and cybersecurity. One approach uses genetic algorithms to inject malicious prompts into AI agents, causing them to misinterpret code and bypass detection systems. Other studies focus on detecting and obfuscating these prompt injection attacks, as well as defending against multi-step trojan attacks that embed persistent control within agent workflows. Additionally, a framework called CVE-Factory automates the creation of executable vulnerability tasks for training and evaluating code security agents, showing significant improvements in models like Qwen3-32B. AI

IMPACT New attack vectors and defense mechanisms for AI agents highlight critical security vulnerabilities in AI-powered tools.

GPT-4o
OWASP
UK AI Safety Institute
AgentThreatBench
Claude Haiku 4.5
Gemini 3.1 Flash Lite
MIT AI Risk Repository
AI agents
frontier models
EU AI Act
LLMs
GT-HarmBench
Qwen3-32B
Claude 4.5
GPT-5.2-1211-Global
genetic algorithm
GPT-5.4
qwen2.5-coder:14b
Claude Sonnet 4.6
prompt injection
Claude 4.5 Sonnet
software reverse engineering
cybersecurity
CVE-Factory
Haiku 4.5
GPT-4o-mini