VIMCAN network fuses Mamba and attention for real-time 3D human pose estimation

By PulseAugur Editorial · [1 sources] · 2026-05-08 10:28

Researchers have developed VIMCAN, a novel hybrid network for visual-inertial 3D human pose estimation. This architecture integrates Mamba's efficient sequence modeling with Cross-Attention's spatial reasoning capabilities to fuse RGB keypoint and IMU data. VIMCAN achieves state-of-the-art accuracy, outperforming Transformer-based methods on benchmarks like TotalCapture and 3DPW, while also enabling real-time inference at over 60 FPS on consumer hardware. AI

IMPACT Introduces a more efficient architecture for real-time 3D human pose estimation, potentially impacting robotics and augmented reality applications.

RANK_REASON Publication of a new academic paper detailing a novel network architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Bin Li · 2026-05-08 10:28

VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network

The rapid advances in deep learning have significantly enhanced the accuracy of multimodal 3D human pose estimation (HPE). However, the state-of-the-art (SOTA) HPE pipelines still rely on Transformers, whose quadratic complexity makes real-time processing for long sequences impra…

COVERAGE [1]

VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network

RELATED ENTITIES

RELATED TOPICS