PulseAugur
EN
LIVE 14:40:08

GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing models by employing early fusion of image features and a Mixture-of-Experts (MoE) design for efficient computational scaling. GMGaze demonstrates state-of-the-art performance on multiple benchmarks, showing improved accuracy in both within-domain and cross-domain gaze estimation tasks. AI

IMPACT Introduces a new architecture for gaze estimation, potentially improving accuracy and efficiency in applications requiring eye-tracking.

RANK_REASON Academic paper introducing a new model architecture and benchmark results.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

COVERAGE [2]

  1. arXiv cs.CV TIER_1 English(EN) · Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb ·

    GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

    arXiv:2605.00799v1 Announce Type: new Abstract: Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrasti…

  2. arXiv cs.CV TIER_1 English(EN) · Reem Kateb ·

    GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

    Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based meth…