GUI agents
PulseAugur coverage of GUI agents — every cluster mentioning GUI agents across labs, papers, and developer communities, ranked by signal.
9 day(s) with sentiment data
-
New EVA framework evolves semantic attacks on GUI agents
Researchers have developed EVA, an evolutionary framework designed to identify semantic vulnerabilities in GUI agents powered by multimodal large language models (MLLMs). This method focuses on manipulating the semantic…
-
StainFlow improves GUI agent training with novel reward model
Researchers have introduced StainFlow, a novel process reward model designed to enhance the training of GUI agents. This method addresses the sparsity of feedback in reinforcement learning by providing finer-grained tra…
-
New DragOn dataset boosts GUI agent drag-and-drop capabilities
Researchers have introduced DragOn, a new benchmark and dataset designed to improve the performance of GUI agents in handling drag-based interactions. The dataset includes 286,000 training screenshots and 3.5 million tr…
-
New benchmark tests AI agents on dynamic short-video platforms
Researchers have introduced "LivingScreen," a new benchmark designed to evaluate GUI agents on dynamic short-video platforms. Unlike previous benchmarks that assume static screens, LivingScreen accounts for continuously…
-
New benchmark and data synthesis boost GUI agent error recovery
Researchers have developed a new benchmark and data synthesis framework to improve the error recovery capabilities of GUI agents. The benchmark, GUI-RobustEval, includes over 1,200 test cases to systematically measure h…
-
New method GUI-CIDER boosts GUI agent knowledge
Researchers have developed GUI-CIDER, a novel mid-training method designed to enhance the world knowledge of GUI agents built with multimodal large language models. This approach explicitly internalizes GUI operational …
-
Mobile world model enhances GUI agents with multimodal predictions
Researchers have developed a novel approach using a "mobile world model" to enhance the capabilities of GUI agents. This model explores four modalities—delta text, full text, diffusion-based images, and renderable code—…
-
New CutVerse benchmark reveals GUI agents struggle with media editing tasks
Researchers have introduced CutVerse, a new benchmark designed to assess the capabilities of GUI agents in media post-production tasks. The benchmark features over 180 complex tasks across seven professional application…
-
New AQuaUI method slashes GUI agent visual tokens
Researchers have developed AQuaUI, a novel method to reduce the number of visual tokens processed by Large Multimodal Models (LMMs) when interacting with graphical user interfaces (GUIs). This training-free technique co…
-
DocOS benchmark tests GUI agents' ability to use online docs
Researchers have introduced DocOS, a new benchmark designed to evaluate GUI agents' ability to proactively use online documentation for task completion. Current GUI agents struggle with tasks requiring procedural knowle…
-
Mobile GUI agents guided by new world models trained on code and text
Researchers have developed a novel approach to enhance mobile GUI agents by training world models across four modalities: delta text, full text, diffusion-based images, and renderable code. These models achieved state-o…