PulseAugur
LIVE 09:13:30
research · [2 sources] ·
0
research

GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing models by employing early fusion of image features and a Mixture-of-Experts (MoE) design for efficient computational scaling. GMGaze demonstrates state-of-the-art performance on multiple benchmarks, showing improved accuracy in both within-domain and cross-domain gaze estimation tasks. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Introduces a new architecture for gaze estimation, potentially improving accuracy and efficiency in applications requiring eye-tracking.

RANK_REASON Academic paper introducing a new model architecture and benchmark results.

Read on arXiv cs.CV →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb ·

    GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

    arXiv:2605.00799v1 Announce Type: new Abstract: Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrasti…

  2. arXiv cs.CV TIER_1 · Reem Kateb ·

    GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

    Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based meth…