Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1d · [5 sources]

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Researchers have introduced Qwen-RobotWorld, a novel language-conditioned video world model designed for embodied intelligence. This model utilizes a double-stream diffusion transformer and an extensive embodied world knowledge corpus to predict future visual trajectories across various robotic domains. Qwen-RobotWorld demonstrates strong performance, achieving top rankings on benchmarks like EWMBench and DreamGen Bench, and outperforming other open-source models on WorldModelBench and PBench. AI

IMPACT This model could accelerate the development of embodied AI by providing a unified framework for training and evaluation across diverse robotic tasks.

Qwen-RobotWorld
Qwen2.5-VL
EWMBench
DreamGen Bench
WorldModelBench
PBench
Hugging Face
RoboTwin-IF
arXiv