Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 1w · [3 sources]

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

Researchers have developed VLESA, a Vision-Language Embodied Safety Agent designed to monitor human activities via egocentric video and intervene in real-time to prevent dangerous actions. This framework addresses intent-dependent safety, where the context of an action determines its risk. VLESA utilizes a novel dataset for goal-conditioned safety annotations and a GRPO-trained Q-filter to evaluate actions based on inferred intent. The system demonstrated improved intervention accuracy on the ASIMOV-2.0 benchmark, enhancing action safety by over 41 percentage points. AI

IMPACT Enhances safety protocols for AI systems operating in physical environments, potentially reducing accidents.

ASIMOV-2.0
VLESA
GRPO