AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees
Researchers have developed AQuaUI, a novel method to reduce the number of visual tokens processed by Large Multimodal Models (LMMs) when interacting with graphical user interfaces (GUIs). This training-free technique constructs an adaptive quadtree on GUI screenshots to represent regions of low information density with a single token, preserving spatial relationships. AQuaUI also incorporates a conditional algorithm that leverages consecutive screenshots to maintain temporal consistency, leading to improved accuracy-efficiency trade-offs in GUI agent models. AI
IMPACT Reduces computational load for GUI agents, potentially enabling faster and more efficient AI-driven user interfaces.