PulseAugur
EN
LIVE 08:34:20

InterCMDM framework enables controllable, long-horizon human interaction generation

Researchers have introduced InterCMDM, a novel block-causal latent diffusion framework designed for autoregressive human interaction generation. This model utilizes a Dual-Stream Causal Diffusion Transformer to maintain separate causal streams for each individual while modeling inter-person dependencies through unified dual-stream attention with multi-task attention masks. These masks allow for the control of diverse coordination behaviors, such as simultaneous actions, reactive responses, and leader-follower dynamics, by simply selecting the desired mask at inference time. The framework's block-wise diffusion objective enables stable latent rollouts over extended sequences without requiring repeated decode-encode cycles, achieving state-of-the-art performance on benchmarks like InterHuman and Inter-X by improving text-motion alignment, realism, and long-horizon continuity. AI

IMPACT This research advances controllable and long-horizon generation for human interactions, potentially impacting animation, robotics, and virtual reality applications.

RANK_REASON The cluster contains a research paper detailing a new model and framework for human interaction generation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

InterCMDM framework enables controllable, long-horizon human interaction generation

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Qing Yu, Kent Fujiwara ·

    InterCMDM: Block-Causal Diffusion for Autoregressive Human Interaction Generation

    arXiv:2607.01743v1 Announce Type: new Abstract: Text-conditioned human interaction generation must capture both long-range temporal causality within each individual and tightly coupled coordination between partners. Existing interaction diffusion models typically denoise full seq…