EngramaBench evaluates long-term conversational memory for LLMs

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-23 02:51

Researchers have introduced EngramaBench, a new benchmark designed to evaluate the long-term conversational memory capabilities of large language models. The benchmark features five distinct personas and one hundred multi-session conversations, with queries testing factual recall, temporal reasoning, and synthesis. In evaluations, GPT-4o with full-context prompting achieved the highest overall score, though a graph-structured memory system called Engrama demonstrated superior performance in cross-space reasoning. AI

影响 Introduces a new benchmark for evaluating LLM long-term memory, potentially guiding future memory system development.

排序理由 This is a research paper introducing a new benchmark for evaluating LLM memory.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Julian Acuna · 2026-04-23 02:51

EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

Large language model assistants are increasingly expected to retain and reason over information accumulated across many sessions. We introduce EngramaBench, a benchmark for long-term conversational memory built around five personas, one hundred multi-session conversations, and on…

报道来源 [1]

EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

相关实体

相关话题