Cosmos 3: Omnimodal World Models for Physical AI
Researchers have introduced Cosmos 3, a new family of omnimodal world models capable of processing and generating data across language, image, video, audio, and action sequences. This unified architecture effectively subsumes various specialized models into a single framework for Physical AI. Cosmos 3 has achieved state-of-the-art results on multiple understanding and generation tasks, positioning it as a scalable backbone for embodied agents. The project has released its code, model checkpoints, datasets, and benchmark to foster open research. AI
IMPACT Establishes a unified framework for embodied agents, potentially accelerating development in physical AI applications.