Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE) design, featuring 128 experts with Top-8 routing, resulting in a 25B-parameter model that activates only 3B parameters. Mamoda2.5 demonstrates top-tier performance on VBench 2.0 for video editing quality, surpassing open-source models and rivaling proprietary ones like Kling O1. The framework also incorporates a distillation and reinforcement learning approach to compress its 30-step editing model into a 4-step version, achieving up to a 95.9x faster inference speed compared to baselines. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Mamoda2.5's efficient MoE architecture and accelerated inference could pave the way for more accessible and powerful multimodal AI tools.
RANK_REASON This is a research paper describing a new multimodal model architecture and its performance on benchmarks.