ENTITY OSWorld

OSWorld

PulseAugur coverage of OSWorld — every cluster mentioning OSWorld across labs, papers, and developer communities, ranked by signal.

Total · 30d

11

11 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

9

9 over 90d

TIER MIX · 90D

significant 1
research 8
tool 1
commentary 1

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL

RESEARCH · CL_107758 · Jun 23 · 12:46

New RL framework uses vision-language models for GUI agent supervision

Researchers have developed a new reinforcement learning framework for Computer-Use Agents (CUAs) that leverages autonomous vision-language evaluation for supervision. This approach addresses the challenge of obtaining s…
COMMENTARY · CL_104609 · Jun 23 · 02:04

AI agents achieve 66% success on desktop tasks, but data gaps remain a challenge

Computer-use agents have shown significant progress, with success rates on the OSWorld benchmark jumping from 12% to 66% in about a year. This rapid advancement was highlighted by Microsoft's Build 2026 keynote, which p…
RESEARCH · CL_91414 · Jun 15 · 04:00

New benchmarks probe AI agent safety against deceptive interfaces and unsafe actions

Two new research papers introduce benchmarks for evaluating the safety of AI agents. OSGuard focuses on computer-use agents, distinguishing between safe and unsafe actions and identifying latent hazards in task executio…
RESEARCH · CL_95769 · Jun 15 · 00:00

New ProCUA-SFT dataset boosts AI agent desktop performance

Researchers have developed ProCUA-SFT, a new dataset designed to improve the training of computer-use agents (CUAs) that interact with graphical desktop environments. Existing datasets like AgentNet have shown negative …
RESEARCH · CL_81266 · Jun 9 · 15:19

AI Memory Systems Can Harm Performance, Research Finds

New research indicates that AI memory systems, while intended to improve user experience and task completion, can paradoxically degrade model performance and foster sycophantic tendencies. Studies show that these system…
TOOL · CL_77253 · Jun 8 · 04:00

New MacArena benchmark tests AI agents on macOS

Researchers have developed MacArena, a new benchmark designed to evaluate computer-use agents (CUAs) operating within a macOS environment. This benchmark includes 421 tasks across 50 applications, specifically tailored …
SIGNIFICANT · CL_66950 · Jun 2 · 14:13

Hcompany ships Holo3.1 agents for fast, local computer use

Hcompany has released Holo3.1, a new family of computer-use agents designed for robust performance across various environments and agent frameworks. This release emphasizes local inference capabilities, offering quantiz…
RESEARCH · CL_58867 · May 28 · 00:00

New benchmark and data synthesis boost GUI agent error recovery

Researchers have developed a new benchmark and data synthesis framework to improve the error recovery capabilities of GUI agents. The benchmark, GUI-RobustEval, includes over 1,200 test cases to systematically measure h…
RESEARCH · CL_48787 · May 25 · 04:00

New frameworks aim to improve AI understanding of user intent

Two new research papers introduce computational frameworks for understanding and controlling user intent in AI interactions. The first, 'Intent Signal Theory,' formalizes the distinction between a user's latent intent a…
RESEARCH · CL_32098 · May 14 · 17:05

AI safety evaluations face 'safe-to-dangerous shift' challenge

A fundamental challenge in AI safety is the "safe-to-dangerous shift," which complicates realistic evaluations of AI models. This shift arises because alignment evaluations must be safe, limiting AI capabilities, while …
RESEARCH · CL_01260 · Jun 3 · 13:27

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Researchers have introduced A11y-Compressor, a framework designed to make GUI agent observations more efficient by transforming linearized accessibility trees into structured representations. This method reduces input t…