PulseAugur
实时 18:07:40
English(EN) VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

新基准VitaBench 2.0测试LLM代理的个性化能力

研究人员推出了VitaBench 2.0,这是一个旨在评估大型语言模型代理在长期用户交互中的个性化和主动性能力的新基准。该基准通过专注于从零散的日常交互中推断和利用用户偏好来解决现有评估的局限性,这是有效协作的一个关键方面。使用VitaBench 2.0进行的实验表明,即使是最先进的LLM在实际个性化方面也面临挑战,突显了当前能力与代理的实际需求之间存在的显著差距。 AI

影响 VitaBench 2.0和Persona2Web等新基准对于推动创建更具个性化和上下文感知能力的AI代理的进展至关重要。

排序理由 该集群描述了一个用于评估AI代理的新学术基准的发布。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新基准VitaBench 2.0测试LLM代理的个性化能力

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Yuxin Chen, Yi Zhang, Zhengzhou Cai, Yaorui Shi, Zhiyuan Yao, Chenhang Cui, Jingnan Zheng, Yaqi Huo, Xi Su, Qi Gu, Xunliang Cai, Xiang Wang, An Zhang, Tat-Seng Chua ·

    VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

    arXiv:2605.27141v1 Announce Type: new Abstract: Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly sta…

  2. arXiv cs.AI TIER_1 English(EN) · Serin Kim, Sangam Lee, Dongha Lee ·

    Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

    arXiv:2602.17003v2 Announce Type: replace-cross Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous querie…

  3. arXiv cs.AI TIER_1 English(EN) · Tat-Seng Chua ·

    VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

    Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragme…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

    VitaBench 2.0 evaluates personalized and proactive agent behavior in long-term user interactions by requiring continuous extraction and updating of user preferences from fragmented interactions.