PulseAugur
实时 07:32:34
实体 WebARENA

WebARENA

PulseAugur coverage of WebARENA — every cluster mentioning WebARENA across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
7
90 天内 7
发布 · 30天
0
90 天内 0
论文 · 30天
6
90 天内 6
层级分布 · 90 天
情绪 · 30 天

2 天有情绪数据

最近 · 第 1/1 页 · 共 7 条
  1. TOOL · CL_50807 ·

    DRIVE framework separates reasoning and interaction skills for web agents

    Researchers have developed a new framework called DRIVE to improve the performance of web agents. DRIVE disentangles reasoning skills, which are abstract and transferable, from interaction skills, which are page-specifi…

  2. RESEARCH · CL_32098 ·

    AI safety evaluations face 'safe-to-dangerous shift' challenge

    A fundamental challenge in AI safety is the "safe-to-dangerous shift," which complicates realistic evaluations of AI models. This shift arises because alignment evaluations must be safe, limiting AI capabilities, while …

  3. TOOL · CL_20717 ·

    cotomi Act agent learns to automate tasks by watching user behavior

    Researchers have developed cotomi Act, a browser agent designed to automate work by learning from user actions. The system achieves high task execution accuracy on the WebArena benchmark, surpassing a human baseline. It…

  4. RESEARCH · CL_11758 ·

    OpAgent achieves 71.6% success rate in web navigation tasks

    Researchers have developed OpAgent, a novel web navigation agent that utilizes online reinforcement learning to overcome the limitations of static datasets. The agent employs a hierarchical multi-task fine-tuning approa…

  5. RESEARCH · CL_11685 ·

    AutoSurfer enhances web agent training with systematic exploration and task synthesis

    Researchers have developed AutoSurfer, a novel system designed to generate comprehensive training data for web agents. This system employs a systematic breadth-first exploration strategy to thoroughly map website functi…

  6. RESEARCH · CL_06733 ·

    AgentHER framework boosts LLM agent training with failed trajectory relabeling

    Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…

  7. TOOL · CL_02389 ·

    OpenAI launches Operator, an AI agent that browses the web to perform tasks

    OpenAI has launched Operator, a new AI agent designed to perform web-based tasks by interacting with websites through its own browser. This agent, powered by a new model called Computer-Using Agent (CUA), can fill forms…