PulseAugur
实时 06:28:08
English(EN) The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

新研究探索使用扩散模型进行先进的视频生成和处理

研究人员正在探索使用扩散模型来改进视频生成和处理的先进技术。一种方法是将状态空间模型(SSM)与视频扩散模型集成,以提高效率并处理更长的序列,在内存使用和性能方面优于基于注意力的方法。其他研究则侧重于通过使用扩散 Transformer 和自适应来提高视频重新照明中的时间一致性,并通过利用预训练的视频扩散模型从视频中重建 4D 手部运动。此外,还在开发用于高效视频恢复和鲁棒点跟踪的方法,方法是调整扩散模型的特征和训练策略。 AI

影响 视频扩散模型的进步有望实现更高效、更连贯的视频生成、改进的重新照明以及对手部运动等复杂运动的更好重建。

排序理由 多篇研究论文详细介绍了使用扩散模型及相关架构在视频生成、恢复和跟踪方面的新颖方法和改进。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 13 个来源。 我们如何撰写摘要 →

新研究探索使用扩散模型进行先进的视频生成和处理

报道来源 [13]

  1. arXiv cs.AI TIER_1 English(EN) · Yujin Tang, Tian Zhou, Xin Lin, Cheng Tan, Yifan Hu, Rong Jin, SouYoung Jin, Liang Sun ·

    Learning Video Dynamics with Predictive Differentiable Rendering

    arXiv:2606.31050v1 Announce Type: cross Abstract: How to accurately predict a high-fidelity future world? While the visual world is inherently continuous, existing deterministic video prediction models operate in discrete pixel space and are mainly optimized with pixel-wise mean …

  2. arXiv cs.AI TIER_1 English(EN) · Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, Yutaka Matsuo ·

    SSM 遇见视频扩散模型:通过结构化状态空间实现高效的长期视频生成

    arXiv:2403.07711v5 Announce Type: replace-cross Abstract: Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generati…

  3. arXiv cs.LG TIER_1 English(EN) · Jing Yang, Mayoore Jaiswal, Zian Wang, Steven Zeng, Rochelle Pereira, Yajie Zhao, Jianyuan Min ·

    HorizonRelight:通过Diffusion Transformers实现长视域视频的一致性重光照

    arXiv:2606.29095v1 Announce Type: cross Abstract: Diffusion-based video relighting enables controllable relighting from a single input video, but modern video diffusion backbones are trained on short clips and applied to long-horizon videos through chunked sliding-window inferenc…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    视频扩散模型在手部动作重建方面的惊人有效性

    ViDiHand uses pretrained video diffusion model representations with hand-overlay rendering to reconstruct 4D hand motion directly from video frames without detectors or optimization.

  5. arXiv cs.CV TIER_1 English(EN) · Haoran Bai, Xiaoxu Chen, Xiaoyu Liu, Zongsheng Yue, Sibin Deng, Wangmeng Zuo, Ying Chen ·

    SATB-VR:使用信噪比感知轨迹混合训练少步视频恢复扩散模型

    arXiv:2606.28677v1 Announce Type: new Abstract: While diffusion models excel in video restoration, their reliance on extensive iterative steps limits efficiency. Conversely, aggressive single-step distillation often compromises fine texture recovery. To achieve an optimal balance…

  6. arXiv cs.CV TIER_1 English(EN) · Yuxi Wang, Chengkai Jin, Yufei Liu, Wenqi Ouyang, Tianyi Wei, Zhiwei Zeng, Siyuan Huang, Zhiqi Shen, Xingang Pan ·

    视频扩散模型在手部运动重建方面的惊人有效性

    arXiv:2606.30308v1 Announce Type: new Abstract: 4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal mo…

  7. arXiv cs.CV TIER_1 English(EN) · Haoran Bai, Xiaoxu Chen, Canqian Yang, Zongyao He, Sibin Deng, Ying Chen ·

    Vivid-VR:从文本到视频的扩散 Transformer 中提炼概念以实现照片级视频修复

    arXiv:2508.14483v4 Announce Type: replace Abstract: We present Vivid-VR, a DiT-based generative video restoration method built upon an advanced T2V foundation model, where ControlNet is leveraged to control the generation process, ensuring content consistency. However, convention…

  8. arXiv cs.CV TIER_1 English(EN) · Soowon Son, Honggyu An, Jisu Nam, Hyunah Ko, Chaehyun Kim, Dahyun Chung, Siyoon Jin, Jung Yi, Junhwa Hur, Seungryong Kim ·

    探索和利用视频扩散Transformer特征以实现鲁棒点跟踪

    arXiv:2512.20606v2 Announce Type: replace Abstract: Despite achieving strong results on standard benchmarks, current point tracking methods rely on feature backbones that are rarely designed with the temporal coherence needed for robust real-world performance. While recent works …

  9. arXiv cs.CV TIER_1 English(EN) · Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Zhensong Zhang, Jifei Song, Jiankang Deng, Ioannis Patras ·

    LatSearch:用于视频扩散模型推理时加速扩展的潜在奖励引导搜索

    arXiv:2603.14526v2 Announce Type: replace Abstract: The recent success of inference-time scaling in large language models has inspired similar explorations in video diffusion. In particular, motivated by the existence of "golden noise" that enhances video quality, prior work has …

  10. arXiv cs.CV TIER_1 English(EN) · Xingang Pan ·

    视频扩散模型在手部动作重建方面的惊人有效性

    4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal modules learned only from scarce hand-pose annotat…

  11. arXiv cs.CV TIER_1 English(EN) · Xi Ye, Wenjia Yang, Yangyang Xu, Xiaoyang Liu, Duo Su, Mengfei Xia, Jun Zhu ·

    SHIFT:视频扩散模型中的运动对齐与对抗性混合微调

    arXiv:2603.17426v2 Announce Type: replace Abstract: Image-conditioned video diffusion models achieve impressive visual realism but often suffer from weakened motion fidelity, e.g., reduced motion dynamics or degraded long-term temporal coherence, especially after fine-tuning. We …

  12. arXiv cs.CV TIER_1 English(EN) · Ruoyu Wang, Jialun Liu, Huayang Huang, Haibin Huang, Jiepeng Wang, Chi Zhang, Xuelong Li, Yu Wu ·

    SIFT:用于视频扩散模型中物理上可行的运动的自想象微调

    arXiv:2606.27741v1 Announce Type: new Abstract: Recent advances in video diffusion models have greatly improved visual fidelity, yet their generated motions often violate physical plausibility. We observe a common kinematic failure, "motion entanglement", the unintended coupling …

  13. arXiv cs.CV TIER_1 English(EN) · Yu Wu ·

    SIFT:用于视频扩散模型中物理上可信运动的自想象微调

    Recent advances in video diffusion models have greatly improved visual fidelity, yet their generated motions often violate physical plausibility. We observe a common kinematic failure, "motion entanglement", the unintended coupling of independent motion sources, such as camera mo…