ENTITY WebARENA

WebARENA

PulseAugur coverage of WebARENA — every cluster mentioning WebARENA across labs, papers, and developer communities, ranked by signal.

Total · 30d

11

11 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

10

10 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL

RESEARCH · CL_111559 · Jun 25 · 07:02

SkillDisCo framework distills agent traces into reusable procedural skills

Researchers have developed SkillDisCo, a framework designed to distill and compile agent traces into reusable procedural skills. This approach aims to reduce redundant reasoning costs and shorten execution traces by ide…
TOOL · CL_96155 · Jun 17 · 04:00

New framework boosts LLM web agent efficiency with tree-structured reasoning

Researchers have introduced Branch-and-Browse, a new framework designed to enhance the capabilities of large language model (LLM)-powered web agents. This framework addresses limitations in reasoning depth and efficienc…
RESEARCH · CL_95867 · Jun 16 · 08:04

New LLM agent SkillMigrator reuses web skills via layout matching

Researchers have developed SkillMigrator, a novel approach for large language model (LLM) web agents to reuse skills across different websites. Unlike previous methods that relied on instruction similarity or site metad…
RESEARCH · CL_91345 · Jun 15 · 04:00

New AI Research Focuses on Privacy in Agent Collaboration

Two new research papers propose methods for enhancing privacy in AI agent collaborations. The first, DiSan, uses a two-stream encoder to disentangle task semantics from source-identifying style in text, enabling joint t…
TOOL · CL_50807 · May 26 · 04:00

DRIVE framework separates reasoning and interaction skills for web agents

Researchers have developed a new framework called DRIVE to improve the performance of web agents. DRIVE disentangles reasoning skills, which are abstract and transferable, from interaction skills, which are page-specifi…
RESEARCH · CL_32098 · May 14 · 17:05

AI safety evaluations face 'safe-to-dangerous shift' challenge

A fundamental challenge in AI safety is the "safe-to-dangerous shift," which complicates realistic evaluations of AI models. This shift arises because alignment evaluations must be safe, limiting AI capabilities, while …
TOOL · CL_20717 · May 7 · 04:00

cotomi Act agent learns to automate tasks by watching user behavior

Researchers have developed cotomi Act, a browser agent designed to automate work by learning from user actions. The system achieves high task execution accuracy on the WebArena benchmark, surpassing a human baseline. It…
RESEARCH · CL_11758 · May 1 · 04:00

OpAgent achieves 71.6% success rate in web navigation tasks

Researchers have developed OpAgent, a novel web navigation agent that utilizes online reinforcement learning to overcome the limitations of static datasets. The agent employs a hierarchical multi-task fine-tuning approa…
RESEARCH · CL_11685 · May 1 · 04:00

AutoSurfer enhances web agent training with systematic exploration and task synthesis

Researchers have developed AutoSurfer, a novel system designed to generate comprehensive training data for web agents. This system employs a systematic breadth-first exploration strategy to thoroughly map website functi…
RESEARCH · CL_06733 · Apr 28 · 04:00

AgentHER framework boosts LLM agent training with failed trajectory relabeling

Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…
TOOL · CL_02389 · Jan 23 · 10:00

OpenAI launches Operator, an AI agent that browses the web to perform tasks

OpenAI has launched Operator, a new AI agent designed to perform web-based tasks by interacting with websites through its own browser. This agent, powered by a new model called Computer-Using Agent (CUA), can fill forms…