PulseAugur
实时 03:15:30

Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing

Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE) design, featuring 128 experts with Top-8 routing, resulting in a 25B-parameter model that activates only 3B parameters. Mamoda2.5 demonstrates top-tier performance on VBench 2.0 for video editing quality, surpassing open-source models and rivaling proprietary ones like Kling O1. The framework also incorporates a distillation and reinforcement learning approach to compress its 30-step editing model into a 4-step version, achieving up to a 95.9x faster inference speed compared to baselines. AI

影响 Mamoda2.5's efficient MoE architecture and accelerated inference could pave the way for more accessible and powerful multimodal AI tools.

排序理由 This is a research paper describing a new multimodal model architecture and its performance on benchmarks.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Yangming Shi, Shixiang Zhu, Tao Shen, Zhimiao Yu, Dengsheng Chen, Taicai Chen, Yunfei Yang, Juan Zhou, Chen Cheng, Liang Ma, Xibin Wu, Benxuan Yan, Ge Li, Tuoyu Zhang, Dan Li, Chang Liu, Zhenbang Sun ·

    Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

    arXiv:2605.02641v1 Announce Type: new Abstract: We present Mamoda2.5, a unified AR-Diffusion framework that seamlessly integrates multimodal understanding and generation within a single architecture. To efficiently enhance the model's generation capability, we equip the Diffusion…

  2. arXiv cs.CV TIER_1 English(EN) · Zhenbang Sun ·

    Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

    We present Mamoda2.5, a unified AR-Diffusion framework that seamlessly integrates multimodal understanding and generation within a single architecture. To efficiently enhance the model's generation capability, we equip the Diffusion Transformer backbone with a fine-grained Mixtur…