inclusionAI releases Vista 9B/4B GUI-grounding models

By PulseAugur Editorial · [1 sources] · 2026-06-13 07:50

inclusionAI has released Vista 9B and Vista 4B, new vision-language models designed for GUI grounding. These models are trained using a view-consistent GRPO approach and self-verified cross-view anchoring, building upon Qwen3.5 backbones. The models map natural language instructions and screenshots to click coordinates within a normalized frame, enabling precise interaction with graphical user interfaces. AI

IMPACT These models advance GUI grounding capabilities, potentially improving human-computer interaction and automation in software.

RANK_REASON Release of new models with novel training techniques from a research-oriented entity. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

inclusionAI releases Vista 9B/4B GUI-grounding models

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/jacek2023 · 2026-06-13 07:50

Vista 9B/4B from inclusionAI

<div class="md"><h1><a href="https://huggingface.co/inclusionAI/VISTA-9B#vista-9b"></a>VISTA-9B</h1> <p>VISTA-9B are GUI-grounding vision-language models trained from Qwen3.5 9B backbones with <strong>VISTA: View-Consistent Self-Verified Training for GUI Grounding<…

COVERAGE [1]

Vista 9B/4B from inclusionAI

RELATED ENTITIES

RELATED TOPICS