MMDiT
PulseAugur coverage of MMDiT — every cluster mentioning MMDiT across labs, papers, and developer communities, ranked by signal.
-
AttnRouter enhances image editing on MMDiT with per-category attention routing
Researchers have developed AttnRouter, a novel method for training-free image editing on the MMDiT model. This approach utilizes KVInject, a single-forward attention manipulation that blends source-image key/value proje…
-
New benchmarks challenge MLLMs' spatial and functional reasoning abilities
Researchers have introduced new benchmarks to evaluate the spatial and functional reasoning capabilities of multimodal large language models (MLLMs). These benchmarks aim to move beyond basic geometric perception to ass…
-
OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space
Researchers have introduced OccDirector, a new framework designed to generate complex 4D occupancy dynamics for autonomous driving simulations based solely on natural language instructions. This system acts as a "scenar…