Researchers have introduced Cosmos 3, a new family of omnimodal world models capable of processing and generating diverse data types including language, images, video, audio, and actions. This unified architecture, built on a mixture-of-transformers, aims to serve as a general-purpose backbone for embodied agents in Physical AI. Evaluations show Cosmos 3 achieving state-of-the-art results across various tasks, and its code, checkpoints, and datasets are being released under an open-source license to foster further research. AI
IMPACT Establishes a new SOTA for embodied agents by unifying multiple modalities, potentially accelerating Physical AI development.
RANK_REASON The cluster describes a new research paper detailing a novel AI model architecture and its performance.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →