PulseAugur
EN
LIVE 12:07:44

inclusionAI releases Vista 9B/4B GUI-grounding models

inclusionAI has released Vista 9B and Vista 4B, new vision-language models designed for GUI grounding. These models are trained using a view-consistent GRPO approach and self-verified cross-view anchoring, building upon Qwen3.5 backbones. The models map natural language instructions and screenshots to click coordinates within a normalized frame, enabling precise interaction with graphical user interfaces. AI

IMPACT These models advance GUI grounding capabilities, potentially improving human-computer interaction and automation in software.

RANK_REASON Release of new models with novel training techniques from a research-oriented entity. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

inclusionAI releases Vista 9B/4B GUI-grounding models

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/jacek2023 ·

    Vista 9B/4B from inclusionAI

    <!-- SC_OFF --><div class="md"><h1><a href="https://huggingface.co/inclusionAI/VISTA-9B#vista-9b"></a>VISTA-9B</h1> <p>VISTA-9B are GUI-grounding vision-language models trained from Qwen3.5 9B backbones with <strong>VISTA: View-Consistent Self-Verified Training for GUI Grounding<…