Researchers have developed GoClick, a novel lightweight vision-language model designed for precise GUI element grounding on resource-constrained devices. Unlike existing large models, GoClick utilizes an encoder-decoder architecture and a progressive data refinement pipeline to achieve high accuracy with significantly fewer parameters. This approach enables on-device execution for GUI agents, improving latency and performance, and has shown success when integrated into device-cloud collaboration frameworks. AI
影响 Enables on-device GUI interaction for agents, potentially improving mobile app automation and accessibility.
排序理由 Academic paper introducing a new lightweight model for GUI element grounding.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →