PulseAugur
EN
LIVE 08:55:05
ENTITY OSWorld

OSWorld

PulseAugur coverage of OSWorld — every cluster mentioning OSWorld across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
11
11 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
9
9 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL
  1. RESEARCH · CL_107758 ·

    New RL framework uses vision-language models for GUI agent supervision

    Researchers have developed a new reinforcement learning framework for Computer-Use Agents (CUAs) that leverages autonomous vision-language evaluation for supervision. This approach addresses the challenge of obtaining s…

  2. COMMENTARY · CL_104609 ·

    AI agents achieve 66% success on desktop tasks, but data gaps remain a challenge

    Computer-use agents have shown significant progress, with success rates on the OSWorld benchmark jumping from 12% to 66% in about a year. This rapid advancement was highlighted by Microsoft's Build 2026 keynote, which p…

  3. RESEARCH · CL_91414 ·

    New benchmarks probe AI agent safety against deceptive interfaces and unsafe actions

    Two new research papers introduce benchmarks for evaluating the safety of AI agents. OSGuard focuses on computer-use agents, distinguishing between safe and unsafe actions and identifying latent hazards in task executio…

  4. RESEARCH · CL_95769 ·

    New ProCUA-SFT dataset boosts AI agent desktop performance

    Researchers have developed ProCUA-SFT, a new dataset designed to improve the training of computer-use agents (CUAs) that interact with graphical desktop environments. Existing datasets like AgentNet have shown negative …

  5. RESEARCH · CL_81266 ·

    AI Memory Systems Can Harm Performance, Research Finds

    New research indicates that AI memory systems, while intended to improve user experience and task completion, can paradoxically degrade model performance and foster sycophantic tendencies. Studies show that these system…

  6. TOOL · CL_77253 ·

    New MacArena benchmark tests AI agents on macOS

    Researchers have developed MacArena, a new benchmark designed to evaluate computer-use agents (CUAs) operating within a macOS environment. This benchmark includes 421 tasks across 50 applications, specifically tailored …

  7. SIGNIFICANT · CL_66950 ·

    Hcompany ships Holo3.1 agents for fast, local computer use

    Hcompany has released Holo3.1, a new family of computer-use agents designed for robust performance across various environments and agent frameworks. This release emphasizes local inference capabilities, offering quantiz…

  8. RESEARCH · CL_58867 ·

    New benchmark and data synthesis boost GUI agent error recovery

    Researchers have developed a new benchmark and data synthesis framework to improve the error recovery capabilities of GUI agents. The benchmark, GUI-RobustEval, includes over 1,200 test cases to systematically measure h…

  9. RESEARCH · CL_48787 ·

    New frameworks aim to improve AI understanding of user intent

    Two new research papers introduce computational frameworks for understanding and controlling user intent in AI interactions. The first, 'Intent Signal Theory,' formalizes the distinction between a user's latent intent a…

  10. RESEARCH · CL_32098 ·

    AI safety evaluations face 'safe-to-dangerous shift' challenge

    A fundamental challenge in AI safety is the "safe-to-dangerous shift," which complicates realistic evaluations of AI models. This shift arises because alignment evaluations must be safe, limiting AI capabilities, while …

  11. RESEARCH · CL_01260 ·

    Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

    Researchers have introduced A11y-Compressor, a framework designed to make GUI agent observations more efficient by transforming linearized accessibility trees into structured representations. This method reduces input t…