PulseAugur
实时 11:01:42

新基准 Pro extsuperscript{2}Bench 旨在实现主动式 AI 程序协助

研究人员推出 EgoProactive,这是一个新的数据集和基准套件,名为 Pro extsuperscript{2}Bench,旨在评估主动式程序协助系统。这些系统旨在为任务提供实时的、循序渐进的指导,包括自主决定何时打断以及如何指导用户。该基准包含对计划外偏差和恢复步骤的明确注释,解决了现有数据集的一个关键限制。所提出的解耦规划器-交互架构,在 Llama 4Qwen-3.6-VL 等模型上进行训练后,在大量实验中表现优于专有和开源基线。 AI

影响 为 AI 程序协助建立了一个新的基准,有可能改进用户指导系统和代理能力。

排序理由 该集群包含一篇研究论文,介绍了一种用于 AI 程序协助的新基准和架构。[lever_c_demoted from research: ic=1 ai=1.0]

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Kaustav Kundu, Ritvik Shrivastava, Maxim Arap, Nanshu Wang, Xianhui Zhu, Quintin Fettes, Gautam Tiwari, Parth Suresh, Th\'eo Moutakanni, Alejandro Castillejo Munoz, Allen Bolourchi, Pascale Fung, Pinar Donmez, Babak Damavandi, Anuj Kumar, Seungwhan Moon ·

    Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

    arXiv:2606.04970v1 Announce Type: cross Abstract: We envision a proactive multi-modal assistant system which gives users real-time step-by-step guidance on a procedural task, autonomously deciding \textit{when} to interrupt, and \textit{how} to coach. However, progress is limited…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

    We envision a proactive multi-modal assistant system which gives users real-time step-by-step guidance on a procedural task, autonomously deciding \textit{when} to interrupt, and \textit{how} to coach. However, progress is limited by the absence of large-scale, cross-domain bench…