Vision-language-action model
PulseAugur coverage of Vision-language-action model — every cluster mentioning Vision-language-action model across labs, papers, and developer communities, ranked by signal.
8 day(s) with sentiment data
-
DriveStack-VLA enhances driving models with spatial intelligence and self-critique
Researchers have introduced DriveStack-VLA, a novel framework designed to enhance the spatial intelligence of vision-language-action driving models. This system leverages a large vision-language model backbone and incor…
-
New G3VLA module enhances robot manipulation VLA models with geometric awareness
Researchers have introduced G$^3$VLA, a novel module designed to enhance Vision-Language-Action (VLA) models for robot manipulation. This module addresses the mismatch between 2D image coordinates and the calibrated geo…
-
New framework enables robots to adapt to new environments without retraining
Researchers have introduced In-Context World Modeling (ICWM), a new framework designed to improve the adaptability of robotic policies. ICWM treats system identification as an in-context adaptation problem, enabling rob…
-
New RECALL method improves VLA model learning with active data collection
Researchers have introduced RECALL, a novel approach to active lifelong learning for Vision-Language-Action (VLA) models. Unlike passive imitation learning, which requires failures to trigger data collection and offers …
-
New methods enhance VLA model efficiency and performance in robotics · 9 sources tracked
Researchers are developing new methods to improve the efficiency and performance of Vision-Language-Action (VLA) models in robotics. One approach, Flow Policy Optimization (FPO), uses reinforcement learning to fine-tune…
-
New protocol measures commonsense knowledge in VLA models
Researchers have developed Act2Answer, a new evaluation protocol designed to assess the commonsense and world knowledge retained by Vision-Language-Action (VLA) models after fine-tuning on robotics data. This protocol a…
-
Vision-language models lack agency and knowledge retention, new papers reveal
Two new research papers highlight limitations in current vision-language models (VLMs), particularly concerning their ability to retain knowledge after fine-tuning and their lack of "agency" in visual reasoning. The fir…
-
New QPILOTS method enhances reinforcement learning for diffusion policies
Researchers have introduced QPILOTS, a novel method designed to improve the efficiency of reinforcement learning (RL) for flow-matching and diffusion policies. This technique steers the denoising process at inference ti…
-
ScoutVLA Model Enhances UAV Question Answering with Active Perception
Researchers have introduced ScoutVLA, a novel dual-expert vision-language-action model designed for aerial embodied question answering. This model addresses the limitations of existing systems by enabling unmanned aeria…
-
Egocentric human video outperforms robot data for embodied AI pretraining
Researchers have found that egocentric human video can be a more effective and cost-efficient data source for pretraining embodied foundation models compared to traditional teleoperated robot trajectories. Studies indic…
-
New APT method boosts VLA model generalization with action expert pretraining
Researchers have developed a new method called APT (Action Expert Pretraining) to improve the generalization capabilities of Vision-Language-Action (VLA) models. These models, which combine vision-language understanding…
-
NUS develops FD-VLA model for enhanced robotic manipulation
Researchers from the National University of Singapore have developed FD-VLA, a novel Vision-Language-Action (VLA) model designed to improve robotic manipulation in contact-rich tasks. Unlike previous VLA models that pri…
-
Embodied AI redefines computer vision's role at CVPR 2026
Embodied AI is shifting the focus of computer vision research, moving from understanding static images to enabling intelligent agents to interact with and manipulate the real world. This paradigm shift, evident at CVPR …
-
NVIDIA's Jim Fan: VLA and remote operation dead, World Action Models rise
NVIDIA's Jim Fan declared the end of Visual-Language-Action (VLA) models and remote operation in robotics, advocating for World Action Models (WAM) as the new paradigm. Fan proposed that WAMs, inspired by Large Language…
-
Airbnb revenue up 18% to $2.7B, while XPeng's AI driving usage surges
Airbnb reported a 18% year-over-year increase in first-quarter revenue, reaching $2.7 billion. The company also achieved a net profit of $160 million and adjusted EBITDA of $519 million, a 24% increase. Separately, Xpen…
-
Xpeng's second-gen VLA system achieves over 50% autonomous driving mileage
XPeng's second-generation VLA system has achieved over 50% of its intelligent driving mileage within a month of its rollout. During the May Day holiday, the AI-assisted driving system saw a daily usage rate of 93.21%, a…
-
MobileEgo Anywhere releases 200-hour egocentric dataset for VLA models
Researchers have introduced MobileEgo Anywhere, a framework and dataset designed to collect extensive egocentric data using commodity mobile devices. This initiative aims to overcome the limitations of existing datasets…
-
AhaRobot: Low-cost open-source bimanual manipulator for embodied AI
Researchers have developed AhaRobot, a low-cost, open-source bimanual mobile manipulator designed to facilitate embodied AI research. The system features a novel SCARA-like dual-arm design for reduced motor torque and a…
-
New attack method targets Transformer vulnerabilities in autonomous driving systems
Researchers have developed a new gray-box attack framework called Adversarial Flow Matching (AFM) that targets vulnerabilities in Transformer modules used by end-to-end autonomous driving systems. AFM can generate visua…
-
VLA emerges as top solution for embodied AI, despite sensory limitations
Visual-Language-Action (VLA) models are currently the leading architecture for embodied AI due to their strong task generalization capabilities. However, VLA has limitations, particularly in tactile and proprioceptive s…