AgentDojo
PulseAugur coverage of AgentDojo — every cluster mentioning AgentDojo across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
Prompt optimization may weaken LLM adversarial robustness, new benchmark suggests
A new benchmark has been developed to investigate whether prompt optimization techniques for Large Language Models (LLMs) weaken their robustness against adversarial attacks, specifically prompt injection. Initial findi…
-
LLM attack benchmarks cover less than 25% of threat landscape
Researchers have developed a new framework to audit the coverage of benchmarks designed to test Large Language Model (LLM) attacks. This framework, based on a taxonomy of over 500 inference-time attacks, reveals that cu…
-
New Protocol Enables LLMs to Safely Control Small Devices
Researchers have introduced the Device Context Protocol (DCP), a new architecture designed to enable large language models (LLMs) to safely control constrained devices. DCP is significantly more lightweight than existin…
-
Arc Gate offers solution to OpenAI's 'unfixable' prompt injection vulnerability
OpenAI has stated that prompt injection in browser agents is an unfixable structural vulnerability at the model level. However, a new architectural solution called Arc Gate has demonstrated significant success in mitiga…
-
LLM attack benchmarks show significant gaps in security coverage
Researchers have developed a new framework to audit the coverage of LLM attack benchmarks, revealing significant gaps in current evaluations. Their analysis of six public benchmarks showed they collectively cover less t…
-
New attack exploits LLM agent relays, bypassing alignment defenses
Researchers have identified a new vulnerability in LLM agent architectures that use Bring-Your-Own-Key (BYOK) systems. These architectures route LLM traffic through third-party relays, creating an integrity gap where a …
-
New research explores LLM agent evaluation and improvement techniques
Researchers are exploring new methods for evaluating and improving Large Language Model (LLM) agents. One paper introduces semantic early-stopping for iterative LLM loops, aiming to reduce token usage by halting when me…