VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring
Researchers have developed VLESA, a Vision-Language Embodied Safety Agent designed to monitor human activities via egocentric video and intervene in real-time to prevent dangerous actions. This framework addresses intent-dependent safety, where the context of an action determines its risk. VLESA utilizes a novel dataset for goal-conditioned safety annotations and a GRPO-trained Q-filter to evaluate actions based on inferred intent. The system demonstrated improved intervention accuracy on the ASIMOV-2.0 benchmark, enhancing action safety by over 41 percentage points. AI
IMPACT Enhances safety protocols for AI systems operating in physical environments, potentially reducing accidents.