New research explores advanced video generation and manipulation with diffusion models

By PulseAugur Editorial · [13 sources] · 2026-06-26 05:39

Researchers are exploring advanced techniques to improve video generation and manipulation using diffusion models. One approach involves integrating State Space Models (SSMs) with video diffusion models to enhance efficiency and handle longer sequences, outperforming attention-based methods in memory usage and performance. Other research focuses on improving temporal consistency in video relighting by using diffusion transformers and self-conditioning, and on reconstructing 4D hand motion from video by leveraging pretrained video diffusion models. Additionally, methods are being developed for efficient video restoration and robust point tracking by adapting diffusion model features and training strategies. AI

IMPACT Advances in video diffusion models promise more efficient and coherent video generation, improved relighting, and better reconstruction of complex motions like hand movements.

RANK_REASON Multiple research papers detailing novel methods and improvements in video generation, restoration, and tracking using diffusion models and related architectures.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 13 sources. How we write summaries →

New research explores advanced video generation and manipulation with diffusion models

COVERAGE [13]

arXiv cs.AI TIER_1 English(EN) · Yujin Tang, Tian Zhou, Xin Lin, Cheng Tan, Yifan Hu, Rong Jin, SouYoung Jin, Liang Sun · 2026-07-01 04:00

Learning Video Dynamics with Predictive Differentiable Rendering

arXiv:2606.31050v1 Announce Type: cross Abstract: How to accurately predict a high-fidelity future world? While the visual world is inherently continuous, existing deterministic video prediction models operate in discrete pixel space and are mainly optimized with pixel-wise mean …
arXiv cs.AI TIER_1 English(EN) · Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, Yutaka Matsuo · 2026-06-30 04:00

SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spaces

arXiv:2403.07711v5 Announce Type: replace-cross Abstract: Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generati…
arXiv cs.LG TIER_1 English(EN) · Jing Yang, Mayoore Jaiswal, Zian Wang, Steven Zeng, Rochelle Pereira, Yajie Zhao, Jianyuan Min · 2026-06-30 04:00

HorizonRelight: Relighting Long-horizon Videos Consistently via Diffusion Transformers

arXiv:2606.29095v1 Announce Type: cross Abstract: Diffusion-based video relighting enables controllable relighting from a single input video, but modern video diffusion backbones are trained on short clips and applied to long-horizon videos through chunked sliding-window inferenc…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-29 00:00

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

ViDiHand uses pretrained video diffusion model representations with hand-overlay rendering to reconstruct 4D hand motion directly from video frames without detectors or optimization.
arXiv cs.CV TIER_1 English(EN) · Haoran Bai, Xiaoxu Chen, Xiaoyu Liu, Zongsheng Yue, Sibin Deng, Wangmeng Zuo, Ying Chen · 2026-06-30 04:00

SATB-VR: Training Few-Step Video Restoration Diffusion Model using SNR-Aware Trajectory Blending

arXiv:2606.28677v1 Announce Type: new Abstract: While diffusion models excel in video restoration, their reliance on extensive iterative steps limits efficiency. Conversely, aggressive single-step distillation often compromises fine texture recovery. To achieve an optimal balance…
arXiv cs.CV TIER_1 English(EN) · Yuxi Wang, Chengkai Jin, Yufei Liu, Wenqi Ouyang, Tianyi Wei, Zhiwei Zeng, Siyuan Huang, Zhiqi Shen, Xingang Pan · 2026-06-30 04:00

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

arXiv:2606.30308v1 Announce Type: new Abstract: 4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal mo…
arXiv cs.CV TIER_1 English(EN) · Haoran Bai, Xiaoxu Chen, Canqian Yang, Zongyao He, Sibin Deng, Ying Chen · 2026-06-30 04:00

Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

arXiv:2508.14483v4 Announce Type: replace Abstract: We present Vivid-VR, a DiT-based generative video restoration method built upon an advanced T2V foundation model, where ControlNet is leveraged to control the generation process, ensuring content consistency. However, convention…
arXiv cs.CV TIER_1 English(EN) · Soowon Son, Honggyu An, Jisu Nam, Hyunah Ko, Chaehyun Kim, Dahyun Chung, Siyoon Jin, Jung Yi, Junhwa Hur, Seungryong Kim · 2026-06-30 04:00

Probing and Leveraging Video Diffusion Transformer Features for Robust Point Tracking

arXiv:2512.20606v2 Announce Type: replace Abstract: Despite achieving strong results on standard benchmarks, current point tracking methods rely on feature backbones that are rarely designed with the temporal coherence needed for robust real-world performance. While recent works …
arXiv cs.CV TIER_1 English(EN) · Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Zhensong Zhang, Jifei Song, Jiankang Deng, Ioannis Patras · 2026-06-30 04:00

LatSearch: Latent Reward-Guided Search for Faster Inference-Time Scaling in Video Diffusion

arXiv:2603.14526v2 Announce Type: replace Abstract: The recent success of inference-time scaling in large language models has inspired similar explorations in video diffusion. In particular, motivated by the existence of "golden noise" that enhances video quality, prior work has …
arXiv cs.CV TIER_1 English(EN) · Xingang Pan · 2026-06-29 13:53

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal modules learned only from scarce hand-pose annotat…
arXiv cs.CV TIER_1 English(EN) · Xi Ye, Wenjia Yang, Yangyang Xu, Xiaoyang Liu, Duo Su, Mengfei Xia, Jun Zhu · 2026-06-29 04:00

SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning

arXiv:2603.17426v2 Announce Type: replace Abstract: Image-conditioned video diffusion models achieve impressive visual realism but often suffer from weakened motion fidelity, e.g., reduced motion dynamics or degraded long-term temporal coherence, especially after fine-tuning. We …
arXiv cs.CV TIER_1 English(EN) · Ruoyu Wang, Jialun Liu, Huayang Huang, Haibin Huang, Jiepeng Wang, Chi Zhang, Xuelong Li, Yu Wu · 2026-06-29 04:00

SIFT: Self-Imagination Fine-Tuning for Physically Plausible Motion in Video Diffusion Models

arXiv:2606.27741v1 Announce Type: new Abstract: Recent advances in video diffusion models have greatly improved visual fidelity, yet their generated motions often violate physical plausibility. We observe a common kinematic failure, "motion entanglement", the unintended coupling …
arXiv cs.CV TIER_1 English(EN) · Yu Wu · 2026-06-26 05:39

SIFT: Self-Imagination Fine-Tuning for Physically Plausible Motion in Video Diffusion Models

Recent advances in video diffusion models have greatly improved visual fidelity, yet their generated motions often violate physical plausibility. We observe a common kinematic failure, "motion entanglement", the unintended coupling of independent motion sources, such as camera mo…

COVERAGE [13]

RELATED ENTITIES

RELATED TOPICS