PulseAugur
EN
LIVE 07:54:19

New dataset and distillation method boost small GUI-grounding AI models

Researchers have developed WinDOM, a new dataset and method for training small, approximately 2B parameter GUI-grounding agents. The WinDOM corpus, containing over 54,000 records, was generated by automating interactions with a Windows 11 web reimplementation, extracting bounding boxes directly from the Document Object Model without human annotation or OCR. This approach is paired with Self-Family Distillation (SFD), a technique that uses either a student model's own evolving state or a larger, same-family teacher model for training. Experiments show that a Qwen3.5-2B model fine-tuned with SFD-4B and Early-init RL achieved significant gains on various benchmarks, outperforming the base model. AI

IMPACT This research offers a novel approach to training smaller, more efficient AI models for GUI grounding, potentially enabling wider on-device deployment and accessibility tools.

RANK_REASON The cluster contains a research paper detailing a new dataset and training methodology for AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New dataset and distillation method boost small GUI-grounding AI models

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Chengheng Li-Chen, Zhiqian Zhou, Hao Chen, Nicolas Chauvin ·

    WinDOM: Self-Family Distillation for Small-Model GUI Grounding

    arXiv:2606.25964v1 Announce Type: cross Abstract: Small ($\sim$2B) GUI-grounding agents are attractive for on-device deployment, accessibility tooling, and low-cost iteration, but at this scale they face two open recipe questions: how to obtain bounding-box training data without …

  2. arXiv cs.AI TIER_1 English(EN) · Nicolas Chauvin ·

    WinDOM: Self-Family Distillation for Small-Model GUI Grounding

    Small ($\sim$2B) GUI-grounding agents are attractive for on-device deployment, accessibility tooling, and low-cost iteration, but at this scale they face two open recipe questions: how to obtain bounding-box training data without expensive human annotation, and how to combine sup…