PulseAugur
实时 23:04:50
English(EN) One Forward Beats Two: InnerZoom for Accurate and Efficient GUI Grounding

InnerZoom框架在单次前向传播中实现SOTA GUI基础定位 · 跟踪3个来源

研究人员开发了InnerZoom,一个新颖的框架,用于在单次前向传播中实现准确高效的GUI基础定位。该方法通过在解码器层之间保留目标区域感知来解决现有多模态大语言模型(MLLM)方法的局限性,这对于GUI交互中精确坐标的生成至关重要。InnerZoom在多个基准测试中取得了最先进的性能,在提高精度的同时降低了计算成本和延迟。 AI

影响 这种新方法可以提高AI代理与图形用户界面交互的效率和准确性。

排序理由 该集群报道了一篇详细介绍一种新GUI基础定位方法的最新研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

InnerZoom框架在单次前向传播中实现SOTA GUI基础定位 · 跟踪3个来源

报道来源 [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    一当二:InnerZoom实现精确高效的GUI定位

    InnerZoom addresses GUI grounding challenges by preserving target-region awareness across decoder layers through a single-forward pass that bridges cross-layer evidence, achieving state-of-the-art performance with reduced computational cost.

  2. arXiv cs.CV TIER_1 English(EN) · Chen Liu, Ling Chen, Hanzhang Zhou, Liangyu Chen, Chenglin Cai, Xin Yu, Steven Hoi, Yue Wang ·

    一当二:InnerZoom实现精准高效的GUI定位

    arXiv:2606.30084v1 Announce Type: new Abstract: MLLM-based GUI grounding methods commonly formulate target localization as autoregressive coordinate generation, enabling models to leverage the strong instruction-following and semantic understanding capabilities of MLLMs. However,…

  3. arXiv cs.CV TIER_1 English(EN) · Yue Wang ·

    一当二:InnerZoom实现精准高效的GUI基础定位

    MLLM-based GUI grounding methods commonly formulate target localization as autoregressive coordinate generation, enabling models to leverage the strong instruction-following and semantic understanding capabilities of MLLMs. However, this formulation requires the model to retain r…