English(EN) VISTA: View-Consistent Self-Verified Training for GUI Grounding

新方法提升 VLM 在 GUI 基础任务上的准确性 · 2 篇论文

作者 PulseAugur 编辑部 · [6 个来源] · 2026-06-12 00:00

两篇新研究论文介绍了用于提高视觉语言模型 (VLM) 在 GUI 基础任务上的准确性和可靠性的新方法。第一篇论文《Trust the Right Teacher》提出了一种质量感知自蒸馏方法，通过使用正确性感知门控和概率缩放来处理不可靠的坐标-token 预测，从而优化教师信号。第二篇论文《VISTA》提出了一个视图一致性自验证训练框架，该框架利用 GUI 的多个语义等价视图来稳定强化学习并提高坐标生成准确性，在 Qwen 主干上取得了显著的提升。 AI

影响 GUI 基础领域的这些进步可能带来更精确、更可靠的人工智能与用户界面的交互，从而改善自动化和用户体验。

排序理由两篇不同的研究论文，为特定的 AI 任务引入了新颖的方法论。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。我们如何撰写摘要 →

报道来源 [6]

arXiv cs.AI TIER_1 English(EN) · Jingyuan Huang, Zuming Huang, Yucheng Shi, Tianze Yang, Xiaoming Zhai, Wei Chu, Ninghao Liu · 2026-06-17 04:00

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

arXiv:2606.18101v1 Announce Type: new Abstract: Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promisi…
arXiv cs.AI TIER_1 English(EN) · Ninghao Liu · 2026-06-16 16:02

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-se…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-16 00:00

信任对的老师：质量感知自蒸馏用于GUI基础

Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling.
arXiv cs.AI TIER_1 English(EN) · Xinyu Qiu, Yunzhu Zhang, Heng Jia, Shuheng Shen, Changhua Meng, Linchao Zhu · 2026-06-15 04:00

VISTA: View-Consistent Self-Verified Training for GUI Grounding

arXiv:2606.14579v1 Announce Type: new Abstract: When applying Group Relative Policy Optimization (GRPO) for GUI Grounding, rollouts are sampled from a single screenshot view; groups often become either all failures on difficult instances or all successes on easy ones, yielding no…
arXiv cs.AI TIER_1 English(EN) · Linchao Zhu · 2026-06-12 15:58

VISTA: 用于 GUI 接地的视图一致性自验证训练

When applying Group Relative Policy Optimization (GRPO) for GUI Grounding, rollouts are sampled from a single screenshot view; groups often become either all failures on difficult instances or all successes on easy ones, yielding no useful relative advantage. We propose VISTA (Vi…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-12 00:00

VISTA: View-Consistent Self-Verified Training for GUI Grounding

VISTA is a GRPO-based training framework for GUI grounding that uses multiple consistent views of the same GUI instance to improve training stability and accuracy.

报道来源 [6]

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

信任对的老师：质量感知自蒸馏用于GUI基础

VISTA: View-Consistent Self-Verified Training for GUI Grounding

VISTA: 用于 GUI 接地的视图一致性自验证训练

VISTA: View-Consistent Self-Verified Training for GUI Grounding

相关实体

相关话题