Web agents
PulseAugur coverage of Web agents — every cluster mentioning Web agents across labs, papers, and developer communities, ranked by signal.
4 day(s) with sentiment data
-
New benchmark reveals hidden failure modes in web agents
A new arXiv paper introduces Parallel WebBench, a benchmark designed to evaluate web agents more rigorously by identifying failures beyond just final answer correctness. The study reveals persistent issues such as searc…
-
New Ko-WideSearch benchmark reveals web agents struggle with breadth-search tasks
A new benchmark called Ko-WideSearch has been developed to evaluate the breadth-search capabilities of web agents, focusing on exhaustive set enumeration rather than depth-based question answering. This Korean-language …
-
New framework MUZZLE finds 44 novel attacks on web agents
Researchers have developed MUZZLE, an automated framework designed to test the security of web agents against indirect prompt injection attacks. This system adaptively identifies vulnerable injection points and crafts c…
-
Web agents should adopt typed actions over click-based browsing
A new position paper proposes a shift from low-level, click-based interactions to typed actions for web agents. This approach, termed 'web verbs,' would expose web operations as typed functions with structured inputs an…
-
TinyFish Vault secures web agent logins without password exposure
TinyFish Vault is a new credential management system designed to allow web agents to access accounts securely. It separates the authentication process from direct password exposure. This enables automated agents to perf…
-
New WARD defense system protects web agents from prompt injection attacks
Researchers have developed WARD, a novel defense system designed to protect web agents from prompt injection attacks. This system addresses limitations of existing guard models, such as poor generalization and high fals…