DeepGaze3.5-VL models human visual scanpaths using autoregressive token prediction

By PulseAugur Editorial · [1 sources] · 2026-07-03 04:00

Researchers have developed DeepGaze3.5-VL, a novel model that frames scanpath prediction as a discrete sequence modeling task using autoregressive token prediction. By mapping visual coordinates into a text vocabulary, the model leverages pretrained Vision-Language Models to capture diverse factors of variation, including personalized biases and task-specific objectives. This approach significantly improves predictive performance, achieving a new state-of-the-art on the MIT1003 dataset with a 46% improvement over its predecessor, DeepGaze III. The generative framework also offers a powerful tool for computational interventions and in-silico simulations of human visual attention. AI

IMPACT Establishes a new state-of-the-art in modeling human visual attention, with potential applications in interface design and cognitive state inference.

RANK_REASON The cluster describes a new research paper detailing a novel model and its performance on a benchmark dataset. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DeepGaze3.5-VL models human visual scanpaths using autoregressive token prediction

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Susmit Agrawal, Matthias Bethge, Matthias K\"ummerer · 2026-07-03 04:00

DeepGaze3.5-VL: Modeling Scanpaths via Autoregressive Token Prediction

arXiv:2607.02083v1 Announce Type: new Abstract: Understanding human visual attention on a scene over time has applications in domains such as interface design and inferring cognitive states. Modeling visual scanpaths has historically relied on specialized architectures with hand-…

COVERAGE [1]

DeepGaze3.5-VL: Modeling Scanpaths via Autoregressive Token Prediction

RELATED ENTITIES

RELATED TOPICS