MMDiT
PulseAugur coverage of MMDiT — every cluster mentioning MMDiT across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
新框架通过知识蒸馏创建轻量化扩散模型
研究人员开发了一个名为 LIFT and PLACE 的新知识蒸馏框架,以创建更高效的扩散模型。该方法通过使用粗到精的对齐策略,解决了学生模型模仿复杂教师模型时遇到的困难。实验表明,该方法在各种扩散模型类型和任务中都有效,即使在显著压缩学生模型的情况下,FID得分也达到了 15.73。
-
新基准测试挑战多模态大语言模型(MLLMs)的空间和功能推理能力
研究人员引入了新的基准测试,用于评估多模态大语言模型(MLLMs)的空间和功能推理能力。这些基准测试旨在超越基本的几何感知,评估结构化空间推理和理解物体在特定情境下的效用等更高级的认知能力。实验表明,当前的多模态大语言模型在整合空间记忆、功能推理和外部知识方面存在困难,这凸显了实现具身智能的重大瓶颈。
-
AttnRouter enhances image editing on MMDiT with per-category attention routing
Researchers have developed AttnRouter, a novel method for training-free image editing on the MMDiT model. This approach utilizes KVInject, a single-forward attention manipulation that blends source-image key/value proje…
-
OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space
Researchers have introduced OccDirector, a new framework designed to generate complex 4D occupancy dynamics for autonomous driving simulations based solely on natural language instructions. This system acts as a "scenar…