PulseAugur
实时 08:52:52
English(EN) VISTA: View-Consistent Self-Verified Training for GUI Grounding

新方法提升 VLM 在 GUI 基础任务上的准确性 · 2 篇论文

两篇新研究论文介绍了用于提高视觉语言模型 (VLM) 在 GUI 基础任务上的准确性和可靠性的新方法。第一篇论文《Trust the Right Teacher》提出了一种质量感知自蒸馏方法,通过使用正确性感知门控和概率缩放来处理不可靠的坐标-token 预测,从而优化教师信号。第二篇论文《VISTA》提出了一个视图一致性自验证训练框架,该框架利用 GUI 的多个语义等价视图来稳定强化学习并提高坐标生成准确性,在 Qwen 主干上取得了显著的提升。 AI

影响 GUI 基础领域的这些进步可能带来更精确、更可靠的人工智能与用户界面的交互,从而改善自动化和用户体验。

排序理由 两篇不同的研究论文,为特定的 AI 任务引入了新颖的方法论。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →

新方法提升 VLM 在 GUI 基础任务上的准确性 · 2 篇论文

报道来源 [6]

  1. arXiv cs.AI TIER_1 English(EN) · Jingyuan Huang, Zuming Huang, Yucheng Shi, Tianze Yang, Xiaoming Zhai, Wei Chu, Ninghao Liu ·

    Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

    arXiv:2606.18101v1 Announce Type: new Abstract: Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promisi…

  2. arXiv cs.AI TIER_1 English(EN) · Ninghao Liu ·

    Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

    Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-se…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    信任对的老师:质量感知自蒸馏用于GUI基础

    Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling.

  4. arXiv cs.AI TIER_1 English(EN) · Xinyu Qiu, Yunzhu Zhang, Heng Jia, Shuheng Shen, Changhua Meng, Linchao Zhu ·

    VISTA: View-Consistent Self-Verified Training for GUI Grounding

    arXiv:2606.14579v1 Announce Type: new Abstract: When applying Group Relative Policy Optimization (GRPO) for GUI Grounding, rollouts are sampled from a single screenshot view; groups often become either all failures on difficult instances or all successes on easy ones, yielding no…

  5. arXiv cs.AI TIER_1 English(EN) · Linchao Zhu ·

    VISTA: 用于 GUI 接地的视图一致性自验证训练

    When applying Group Relative Policy Optimization (GRPO) for GUI Grounding, rollouts are sampled from a single screenshot view; groups often become either all failures on difficult instances or all successes on easy ones, yielding no useful relative advantage. We propose VISTA (Vi…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    VISTA: View-Consistent Self-Verified Training for GUI Grounding

    VISTA is a GRPO-based training framework for GUI grounding that uses multiple consistent views of the same GUI instance to improve training stability and accuracy.