New model ZGL uses language to improve human motion prediction

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have developed ZGL, a novel language-conditioned model for predicting human motion. This model integrates semantic guidance from motion descriptions into a strong motion prediction backbone. By using a vision-language model to generate captions for observed poses and then encoding these captions with CLIP-L, ZGL injects conditioning tokens into a Transformer architecture via cross-attention adapters with zero gates. This approach allows the model to learn language conditioning only when it improves prediction accuracy, demonstrating enhanced performance on the Human3.6M dataset and showing transferability to the CMUMocap benchmark. AI

IMPACT Introduces a method for incorporating semantic understanding into motion prediction, potentially improving realism and controllability in animation and robotics.

RANK_REASON Publication of a new research paper on arXiv detailing a novel model for human motion prediction. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New model ZGL uses language to improve human motion prediction

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Guanhui Qiao, Lu Zhou, Ding Jiang, Jinqiao Wang · 2026-06-30 04:00

Zero-Gated Language-conditioned Human Motion Prediction

arXiv:2606.29208v1 Announce Type: new Abstract: Pose histories provide the core kinematic evidence for 3D human motion prediction, but they lack explicit high-level semantic guidance. This paper introduces ZGL, a lightweight language-conditioned predictor that uses captions of th…

COVERAGE [1]

Zero-Gated Language-conditioned Human Motion Prediction

RELATED ENTITIES

RELATED TOPICS