ENTITY GUI agents

GUI agents

PulseAugur coverage of GUI agents — every cluster mentioning GUI agents across labs, papers, and developer communities, ranked by signal.

Total · 30d

11

11 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

11

11 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

used by AndroidWorld 70%

SENTIMENT · 30D

9 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL

TOOL · CL_77283 · Jun 8 · 04:00

New EVA framework evolves semantic attacks on GUI agents

Researchers have developed EVA, an evolutionary framework designed to identify semantic vulnerabilities in GUI agents powered by multimodal large language models (MLLMs). This method focuses on manipulating the semantic…
RESEARCH · CL_77162 · Jun 5 · 08:17

StainFlow improves GUI agent training with novel reward model

Researchers have introduced StainFlow, a novel process reward model designed to enhance the training of GUI agents. This method addresses the sparsity of feedback in reinforcement learning by providing finer-grained tra…
RESEARCH · CL_72502 · Jun 4 · 15:57

New DragOn dataset boosts GUI agent drag-and-drop capabilities

Researchers have introduced DragOn, a new benchmark and dataset designed to improve the performance of GUI agents in handling drag-based interactions. The dataset includes 286,000 training screenshots and 3.5 million tr…
RESEARCH · CL_70430 · Jun 3 · 10:25

New benchmark tests AI agents on dynamic short-video platforms

Researchers have introduced "LivingScreen," a new benchmark designed to evaluate GUI agents on dynamic short-video platforms. Unlike previous benchmarks that assume static screens, LivingScreen accounts for continuously…
RESEARCH · CL_58867 · May 28 · 00:00

New benchmark and data synthesis boost GUI agent error recovery

Researchers have developed a new benchmark and data synthesis framework to improve the error recovery capabilities of GUI agents. The benchmark, GUI-RobustEval, includes over 1,200 test cases to systematically measure h…
RESEARCH · CL_56333 · May 27 · 00:00

New method GUI-CIDER boosts GUI agent knowledge

Researchers have developed GUI-CIDER, a novel mid-training method designed to enhance the world knowledge of GUI agents built with multimodal large language models. This approach explicitly internalizes GUI operational …
TOOL · CL_48788 · May 24 · 00:00

Mobile world model enhances GUI agents with multimodal predictions

Researchers have developed a novel approach using a "mobile world model" to enhance the capabilities of GUI agents. This model explores four modalities—delta text, full text, diffusion-based images, and renderable code—…
TOOL · CL_41190 · May 19 · 07:35

New CutVerse benchmark reveals GUI agents struggle with media editing tasks

Researchers have introduced CutVerse, a new benchmark designed to assess the capabilities of GUI agents in media post-production tasks. The benchmark features over 180 complex tasks across seven professional application…
TOOL · CL_49337 · May 19 · 02:13

New AQuaUI method slashes GUI agent visual tokens

Researchers have developed AQuaUI, a novel method to reduce the number of visual tokens processed by Large Multimodal Models (LMMs) when interacting with graphical user interfaces (GUIs). This training-free technique co…
TOOL · CL_38685 · May 18 · 08:36

DocOS benchmark tests GUI agents' ability to use online docs

Researchers have introduced DocOS, a new benchmark designed to evaluate GUI agents' ability to proactively use online documentation for task completion. Current GUI agents struggle with tasks requiring procedural knowle…
TOOL · CL_28329 · May 11 · 10:49

Mobile GUI agents guided by new world models trained on code and text

Researchers have developed a novel approach to enhance mobile GUI agents by training world models across four modalities: delta text, full text, diffusion-based images, and renderable code. These models achieved state-o…