GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

By PulseAugur Editorial · [2 sources] · 2026-05-01 17:35

Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing models by employing early fusion of image features and a Mixture-of-Experts (MoE) design for efficient computational scaling. GMGaze demonstrates state-of-the-art performance on multiple benchmarks, showing improved accuracy in both within-domain and cross-domain gaze estimation tasks. AI

IMPACT Introduces a new architecture for gaze estimation, potentially improving accuracy and efficiency in applications requiring eye-tracking.

RANK_REASON Academic paper introducing a new model architecture and benchmark results.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

COVERAGE [2]

arXiv cs.CV TIER_1 English(EN) · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb · 2026-05-04 04:00

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

arXiv:2605.00799v1 Announce Type: new Abstract: Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrasti…
arXiv cs.CV TIER_1 English(EN) · Reem Kateb · 2026-05-01 17:35

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based meth…

COVERAGE [2]

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

RELATED ENTITIES

RELATED TOPICS