Vision-Language Action Models
PulseAugur coverage of Vision-Language Action Models — every cluster mentioning Vision-Language Action Models across labs, papers, and developer communities, ranked by signal.
8 day(s) with sentiment data
-
New benchmarks and methods improve AI agent uncertainty quantification
Researchers have developed new methods for quantifying uncertainty in AI agents that interact with graphical user interfaces (GUIs) and in vision-language-action models (VLAs) used in robotics. The first study, "Argus,"…
-
New AI models tackle long-horizon planning for autonomous driving
Researchers are developing advanced AI models for autonomous driving, focusing on improving trajectory planning and long-horizon decision-making. Several new frameworks, including ParkingTransformer, TerraTransfer, Alig…
-
New robot policy models enhance action generation and efficiency
Researchers have developed new methods for robot policy learning that improve efficiency and accuracy in action generation. LeaP, a learnable source prior, optimizes the starting point for action generation by condition…
-
Autoregressive Policies Achieve Real-Time Execution in VLA Models
A new research paper introduces a method for achieving real-time execution in autoregressive policies for Vision-Language-Action models. The approach involves adjusting the tokenization horizon and employing constrained…
-
Robots learn manipulation from human videos using keypoint tracking
Researchers have developed a new framework called Dexterous Point Policy that learns robotic manipulation skills directly from human videos, eliminating the need for costly robot-specific demonstrations. The system util…
-
New 'State Backdoor' attack targets embodied AI models
Researchers have developed a new type of backdoor attack targeting Vision-Language-Action (VLA) models, which are crucial for embodied AI applications like robotics. Unlike previous methods that rely on visible visual t…
-
CVPR 2026: Computer Vision and Robotics Merge, Chinese AI Dominates
The CVPR 2026 conference in Denver marked a significant convergence of computer vision and robotics, with a strong emphasis on multimodal foundation models and embodied AI. Chinese universities and companies showcased s…
-
New AI defenses and attacks target vision-language models
Researchers have developed new methods to defend against and exploit backdoor attacks in advanced AI models. One approach, BYORn, aims to improve the robustness of large vision-language models by identifying and replaci…
-
Robotics VLA models gain foresight with mixture of horizons strategy
Researchers have developed a "mixture of horizons" (MoH) strategy to improve the performance of vision-language-action (VLA) models in robotics. This approach addresses the trade-off between long-term foresight and fine…
-
New benchmark reveals VLA models struggle with semantic grounding
Researchers have introduced RoboSemanticBench (RSB), a new benchmark designed to evaluate the semantic grounding capabilities of vision-language-action (VLA) models. The benchmark tests whether these models can accurate…
-
New research probes VLM susceptibility to visual persuasion and influence
Researchers are developing new frameworks to evaluate the susceptibility of Vision-Language Models (VLMs) to multimodal persuasion and visual influences. One study introduces MMPersuade to test agent-to-agent persuasion…
-
New VLA-Hijack attack exploits visual self-localization in AI models
Researchers have developed VLA-Hijack, a novel adversarial framework designed to exploit vulnerabilities in Vision-Language-Action (VLA) models. This method targets the models' reliance on visual self-localization of ro…
-
VLANeXt model offers recipe for stronger Vision-Language-Action models
Researchers have developed VLANeXt, a new Vision-Language-Action (VLA) model that improves upon existing architectures by systematically analyzing and optimizing design choices. Through a unified framework and evaluatio…
-
Pelican-Unified 1.0 model unifies embodied AI capabilities
Researchers have introduced Pelican-Unified 1.0, a novel embodied intelligence model that integrates understanding, reasoning, imagination, and action into a single system. This unified approach uses a single vision-lan…
-
RLDX-1 robotic policy enhances dexterous manipulation with new transformer architecture
Researchers have introduced RLDX-1, a new robotic policy designed for dexterous manipulation that integrates heterogeneous modalities through a Multi-Stream Action Transformer architecture. This approach aims to overcom…
-
RoboECC framework optimizes VLA model deployment across edge and cloud
Researchers have developed RoboECC, a new framework for deploying Vision-Language-Action (VLA) models by distributing their computation between edge devices and the cloud. This approach addresses the high inference cost…