Researchers have developed ArcANE, a new benchmark designed to evaluate role-playing language agents (RPLAs) on their ability to maintain character consistency over time. Unlike previous benchmarks that focus on factual recall, ArcANE assesses how well agents adapt to a character's evolving psychological trajectory throughout a narrative. The benchmark, constructed from 17 novels and 80 characters, segments stories into phases to test agent responses in both in-text and novel scenarios, demonstrating that conditioning on character arcs significantly improves performance, especially when information is not directly retrievable from the source text. AI
IMPACT This benchmark could drive development of more sophisticated AI agents capable of nuanced, dynamic character portrayal in interactive narratives.
RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating language agents.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →