Meituan's LongCat team has released VitaBench 2.0, an evaluation benchmark designed for assessing large language models in long-term, dynamic user interaction scenarios. This new version focuses on the models' ability to personalize and act proactively in real-life situations, building upon the foundation of VitaBench 1.0 released last October. AI
IMPACT Provides a new standard for evaluating LLM capabilities in long-term user interactions.
RANK_REASON Release of a new benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →