PulseAugur
EN
LIVE 14:54:35
中文(ZH) 美团LongCat开源 VitaBench2.0

Meituan LongCat releases VitaBench 2.0 for LLM user modeling

Meituan's LongCat team has released VitaBench 2.0, an evaluation benchmark designed for assessing large language models in long-term, dynamic user interaction scenarios. This new version focuses on the models' ability to personalize and act proactively in real-life situations, building upon the foundation of VitaBench 1.0 released last October. AI

IMPACT Provides a new standard for evaluating LLM capabilities in long-term user interactions.

RANK_REASON Release of a new benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on 36氪 (36Kr) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Meituan LongCat releases VitaBench 2.0 for LLM user modeling

COVERAGE [1]

  1. 36氪 (36Kr) TIER_1 中文(ZH) ·

    Meituan LongCat Open Sources VitaBench 2.0

    36氪获悉,自去年10月发布了VitaBench 1.0,美团LongCat团队再次推出VitaBench 2.0。VitaBench 2.0是首个真实生活场景下面向长期动态用户建模的智能体评测基准,它系统性地评测大语言模型在长期、真实、动态的用户互动中个性化与主动性的能力。