YOSE framework speeds up video object removal with token selection

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-01 04:00

Researchers have developed YOSE, a new framework designed to significantly speed up video object removal using Diffusion Transformer (DiT) models. YOSE achieves this efficiency by adaptively selecting only the essential tokens for processing, rather than computing over the entire video frame. This mask-aware acceleration allows inference time to scale with the size of the masked region, leading to up to a 2.5X speedup in many scenarios while maintaining comparable visual quality to existing methods. AI

影响 Accelerates video object removal tasks by making DiT-based methods more computationally efficient.

排序理由 Academic paper introducing a new method for improving efficiency in AI-driven video processing.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Chenyang Wu, Lina Lei, Fan Li, Chun-Le Guo, Dehong Kong, Xinran Qin, Zhixin Wang, Ming-Ming Cheng, Chongyi Li · 2026-05-01 04:00

YOSE: You Only Select Essential Tokens for Efficient DiT-based Video Object Removal

arXiv:2604.27322v1 Announce Type: new Abstract: Recent advances in Diffusion Transformer (DiT)-based video generation technologies have shown impressive results for video object removal. However, these methods still suffer from substantial inference latency. For instance, althoug…

报道来源 [1]

YOSE: You Only Select Essential Tokens for Efficient DiT-based Video Object Removal

相关实体

相关话题