NVIDIA's Jim Fan declared the end of Visual-Language-Action (VLA) models and remote operation in robotics, advocating for World Action Models (WAM) as the new paradigm. Fan proposed that WAMs, inspired by Large Language Models (LLMs), will leverage next-state prediction and action fine-tuning for robot control. He emphasized a shift towards using first-person human video data as the primary training source, moving away from the limitations of remote operation data collection. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT This commentary signals a potential shift in robotics research and development, moving towards new model architectures and data strategies.
RANK_REASON This is a commentary on the future of robotics by a prominent researcher, not a direct model release or product announcement.