Brief · PulseAugur

TOOL · 雷峰网 (Leiphone) 中文(ZH) · 6h

Making Robot Actions More Grounded: Fudan et al. Propose GuidedVLA to Enhance VLA Controllability and Interpretability

Researchers from Fudan University, Shanghai Jiao Tong University, and OpenDriveLab have introduced GuidedVLA, a novel approach to enhance the controllability and interpretability of Vision-Language-Action (VLA) models for robotics. This method explicitly guides the VLA's action generation process by breaking down task-relevant factors into distinct components: target object localization, task stage recognition, and spatial geometric understanding. By incorporating these specialized attention mechanisms, GuidedVLA aims to improve robot performance in complex and dynamic environments, making failure diagnosis and system improvement more manageable. AI

IMPACT Enhances robot task success and interpretability by explicitly guiding action generation, aiding in complex real-world scenarios.

Qwen3-VL
SAM2
Shanghai Jiao Tong University
Fudan University
RoboTwin 2.0
LIBERO-Plus
Robotics: Science and Systems (RSS) 2026
OpenDriveLab
GuidedVLA
ALOHA AgileX
PSI-Bot RealMan