Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

Researchers have introduced EgoProactive, a new dataset and benchmark suite called Pro extsuperscript{2}Bench, designed to evaluate proactive procedural assistance systems. These systems aim to provide real-time, step-by-step guidance for tasks, including autonomously deciding when to interrupt and how to coach users. The benchmark incorporates explicit annotations for out-of-plan deviations and recovery steps, addressing a key limitation in existing datasets. The proposed decoupled planner-interaction architecture, when trained on models like Llama 4 and Qwen-3.6-VL, demonstrated superior performance over proprietary and open-weight baselines in extensive experiments. AI

IMPACT Establishes a new benchmark for AI procedural assistance, potentially improving user guidance systems and agent capabilities.

GPT 5.2
Gemini 3.1 Pro
Claude Opus 4.6
Llama 4
EgoProactive
Pro extsuperscript{2}Bench
Qwen-3.6-VL