Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · Hugging Face Daily Papers English(EN) · 1w

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

Researchers have introduced CutVerse, a new benchmark designed to assess the capabilities of GUI agents in media post-production tasks. The benchmark features over 180 complex tasks across seven professional applications like Premiere Pro and Photoshop, requiring dense multimodal interactions. Current agents show only a 36% success rate on these realistic editing workflows, highlighting limitations in long-horizon reliability and domain-specific planning. AI

IMPACT The CutVerse benchmark highlights significant challenges for current GUI agents in complex media editing, suggesting a need for improved long-horizon planning and domain-specific capabilities.
TOOL · Hugging Face Daily Papers English(EN) · 2d · [5 sources]

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

Researchers have developed a novel approach using a "mobile world model" to enhance the capabilities of GUI agents. This model explores four modalities—delta text, full text, diffusion-based images, and renderable code—to predict action consequences in mobile interfaces. The findings indicate that while renderable code offers high fidelity for in-distribution tasks, text-based feedback is more robust for online execution. Generated trajectories from these world models can improve agent performance by providing transferable interaction experience, though they may not perfectly preserve the original data distribution. The research also suggests that for agents prone to overconfidence, world models are more effective as prior perception or training supervision rather than as post-hoc verifiers. AI

IMPACT Enhances GUI agent reliability and task performance through multimodal world modeling and transferable interaction experience.
TOOL · arXiv cs.MA (Multiagent) English(EN) · 1w

AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

Researchers have developed AQuaUI, a novel method to reduce the number of visual tokens processed by Large Multimodal Models (LMMs) when interacting with graphical user interfaces (GUIs). This training-free technique constructs an adaptive quadtree on GUI screenshots to represent regions of low information density with a single token, preserving spatial relationships. AQuaUI also incorporates a conditional algorithm that leverages consecutive screenshots to maintain temporal consistency, leading to improved accuracy-efficiency trade-offs in GUI agent models. AI

IMPACT Reduces computational load for GUI agents, potentially enabling faster and more efficient AI-driven user interfaces.

Brief

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees