Vision-language-action model
PulseAugur coverage of Vision-language-action model — every cluster mentioning Vision-language-action model across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
Embodied AI redefines computer vision's role at CVPR 2026
Embodied AI is shifting the focus of computer vision research, moving from understanding static images to enabling intelligent agents to interact with and manipulate the real world. This paradigm shift, evident at CVPR …
-
NVIDIA's Jim Fan: VLA and remote operation dead, World Action Models rise
NVIDIA's Jim Fan declared the end of Visual-Language-Action (VLA) models and remote operation in robotics, advocating for World Action Models (WAM) as the new paradigm. Fan proposed that WAMs, inspired by Large Language…
-
Airbnb revenue up 18% to $2.7B, while XPeng's AI driving usage surges
Airbnb reported a 18% year-over-year increase in first-quarter revenue, reaching $2.7 billion. The company also achieved a net profit of $160 million and adjusted EBITDA of $519 million, a 24% increase. Separately, Xpen…
-
Xpeng's second-gen VLA system achieves over 50% autonomous driving mileage
XPeng's second-generation VLA system has achieved over 50% of its intelligent driving mileage within a month of its rollout. During the May Day holiday, the AI-assisted driving system saw a daily usage rate of 93.21%, a…
-
MobileEgo Anywhere releases 200-hour egocentric dataset for VLA models
Researchers have introduced MobileEgo Anywhere, a framework and dataset designed to collect extensive egocentric data using commodity mobile devices. This initiative aims to overcome the limitations of existing datasets…
-
AhaRobot: Low-cost open-source bimanual manipulator for embodied AI
Researchers have developed AhaRobot, a low-cost, open-source bimanual mobile manipulator designed to facilitate embodied AI research. The system features a novel SCARA-like dual-arm design for reduced motor torque and a…
-
New attack method targets Transformer vulnerabilities in autonomous driving systems
Researchers have developed a new gray-box attack framework called Adversarial Flow Matching (AFM) that targets vulnerabilities in Transformer modules used by end-to-end autonomous driving systems. AFM can generate visua…
-
VLA emerges as top solution for embodied AI, despite sensory limitations
Visual-Language-Action (VLA) models are currently the leading architecture for embodied AI due to their strong task generalization capabilities. However, VLA has limitations, particularly in tactile and proprioceptive s…
-
AI² Robotics launches NeuroVLA model and open-source AlphaBrain platform
AI² Robotics founder Guo Yandong has introduced the NeuroVLA model, which he described as a brain-inspired architecture. He also launched the AlphaBrain Platform, an open-source toolkit designed to support plug-and-play…
-
FASTER model slashes VLA reaction latency for real-time robotics
Researchers have developed a new method called FASTER to improve the real-time responsiveness of Vision-Language-Action (VLA) models. Existing methods often delay action until all sampling steps are complete, creating a…
-
XPeng GX debuts with advanced VLA driving tech, boosting Ultra model orders
XPeng unveiled its new GX model and the second-generation VLA intelligent driving system at the Beijing Auto Show. The company reported a significant increase in orders for its Ultra models, with a 118% month-over-month…
-
New VLA models LaST-R1 and DIAL enhance robotic manipulation with advanced reasoning
Two new research papers introduce advanced Vision-Language-Action (VLA) models for robotic manipulation. LaST-R1 integrates latent Chain-of-Thought reasoning with reinforcement learning to improve adaptability and gener…
-
Robotic control framework GeCO uses iterative optimization for adaptive, robust actions
Researchers have developed a new framework called Generative Control as Optimization (GeCO) that reframes robotic control from trajectory integration to iterative optimization. This approach allows for adaptive computat…
-
New methods KERV and HeiSD accelerate embodied VLA models with kinematic awareness
Two new research papers introduce methods to accelerate the inference speed of Vision-Language-Action (VLA) models used for robot control. KERV utilizes a Kalman Filter to predict actions and adjust acceptance threshold…
-
New datasets aim to improve linguistic diversity and spatial alignment for embodied AI
Two new datasets aim to improve embodied AI research by addressing limitations in existing data. One paper, "Limited Linguistic Diversity in Embodied AI Datasets," audits current corpora and finds they often use repetit…
-
AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation
Researchers have developed AsyncShield, a new framework designed to improve the navigation capabilities of Vision-Language-Action (VLA) models on mobile robots. This system addresses the latency and network jitter issue…
-
SimOne 4.0 simulation platform integrates with NVIDIA, aiding AI in physical world
Wuyiyishijie has released the preliminary version of its SimOne 4.0 simulation platform, built on world models and VLA technology. This upgrade covers data, training, inference, validation, and delivery, aiming to enhan…
-
Libra-VLA model introduces coarse-to-fine dual-system for robotic manipulation
Researchers have introduced Libra-VLA, a new Vision-Language-Action (VLA) model designed for robotic manipulation. Unlike previous monolithic approaches, Libra-VLA employs a coarse-to-fine dual-system architecture. This…
-
Libra-VLA model balances learning for enhanced robotic manipulation
Researchers have introduced Libra-VLA, a novel Vision-Language-Action (VLA) model designed for robotic manipulation. This architecture employs a coarse-to-fine dual-system approach, decoupling the learning process into …
-
GM trains scalable driving AI using simulation and VLA models for rare scenarios
General Motors is developing advanced AI systems to tackle the complex challenges of autonomous driving, particularly focusing on rare and unpredictable "long-tail" scenarios. They are employing a combination of large-s…