inclusionAI has released Vista 9B and Vista 4B, new vision-language models designed for GUI grounding. These models are trained using a view-consistent GRPO approach and self-verified cross-view anchoring, building upon Qwen3.5 backbones. The models map natural language instructions and screenshots to click coordinates within a normalized frame, enabling precise interaction with graphical user interfaces. AI
IMPACT These models advance GUI grounding capabilities, potentially improving human-computer interaction and automation in software.
RANK_REASON Release of new models with novel training techniques from a research-oriented entity. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →