Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation
Researchers have introduced Qwen-RobotWorld, a novel language-conditioned video world model designed for embodied intelligence. This model utilizes a double-stream diffusion transformer and an extensive embodied world knowledge corpus to predict future visual trajectories across various robotic domains. Qwen-RobotWorld demonstrates strong performance, achieving top rankings on benchmarks like EWMBench and DreamGen Bench, and outperforming other open-source models on WorldModelBench and PBench. AI
IMPACT This model could accelerate the development of embodied AI by providing a unified framework for training and evaluation across diverse robotic tasks.