Researchers have developed WinDOM, a new dataset and method for training small, approximately 2B parameter GUI-grounding agents. The WinDOM corpus, containing over 54,000 records, was generated by automating interactions with a Windows 11 web reimplementation, extracting bounding boxes directly from the Document Object Model without human annotation or OCR. This approach is paired with Self-Family Distillation (SFD), a technique that uses either a student model's own evolving state or a larger, same-family teacher model for training. Experiments show that a Qwen3.5-2B model fine-tuned with SFD-4B and Early-init RL achieved significant gains on various benchmarks, outperforming the base model. AI
IMPACT This research offers a novel approach to training smaller, more efficient AI models for GUI grounding, potentially enabling wider on-device deployment and accessibility tools.
RANK_REASON The cluster contains a research paper detailing a new dataset and training methodology for AI models. [lever_c_demoted from research: ic=1 ai=1.0]
- Chengheng Li Chen
- Document Object Model
- GRPO
- Hugging Face
- Playwright
- Qwen3.5-2B
- Self-Family Distillation
- SFD-4B
- WinDOM
- Windows 11
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →