PulseAugur
EN
LIVE 08:51:27

New methods enhance VLM accuracy for GUI grounding tasks · 2 papers

Two new research papers introduce novel methods for improving the accuracy and reliability of vision-language models (VLMs) in GUI grounding tasks. The first paper, "Trust the Right Teacher," proposes quality-aware self-distillation, which refines teacher signals by using correctness-aware gating and probability scaling to handle unreliable coordinate-token predictions. The second paper, "VISTA," presents a view-consistent self-verified training framework that leverages multiple, semantically equivalent views of a GUI to stabilize reinforcement learning and improve coordinate generation accuracy, showing significant gains on Qwen backbones. AI

IMPACT These advancements in GUI grounding could lead to more precise and reliable AI interactions with user interfaces, improving automation and user experience.

RANK_REASON Two distinct research papers introducing novel methodologies for a specific AI task.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

New methods enhance VLM accuracy for GUI grounding tasks · 2 papers

COVERAGE [6]

  1. arXiv cs.AI TIER_1 English(EN) · Jingyuan Huang, Zuming Huang, Yucheng Shi, Tianze Yang, Xiaoming Zhai, Wei Chu, Ninghao Liu ·

    Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

    arXiv:2606.18101v1 Announce Type: new Abstract: Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promisi…

  2. arXiv cs.AI TIER_1 English(EN) · Ninghao Liu ·

    Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

    Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-se…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

    Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling.

  4. arXiv cs.AI TIER_1 English(EN) · Xinyu Qiu, Yunzhu Zhang, Heng Jia, Shuheng Shen, Changhua Meng, Linchao Zhu ·

    VISTA: View-Consistent Self-Verified Training for GUI Grounding

    arXiv:2606.14579v1 Announce Type: new Abstract: When applying Group Relative Policy Optimization (GRPO) for GUI Grounding, rollouts are sampled from a single screenshot view; groups often become either all failures on difficult instances or all successes on easy ones, yielding no…

  5. arXiv cs.AI TIER_1 English(EN) · Linchao Zhu ·

    VISTA: View-Consistent Self-Verified Training for GUI Grounding

    When applying Group Relative Policy Optimization (GRPO) for GUI Grounding, rollouts are sampled from a single screenshot view; groups often become either all failures on difficult instances or all successes on easy ones, yielding no useful relative advantage. We propose VISTA (Vi…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    VISTA: View-Consistent Self-Verified Training for GUI Grounding

    VISTA is a GRPO-based training framework for GUI grounding that uses multiple consistent views of the same GUI instance to improve training stability and accuracy.