A Large-Scale Multi-Dimensional Empirical Study of LLMs for Conversation Summarization
Two new arXiv papers explore the effectiveness of Large Language Models (LLMs) for abstractive summarization. The first paper introduces OmniCSEval, a comprehensive benchmark designed to evaluate LLMs across diverse scenarios, context lengths, and reasoning capabilities, using a novel fact-checking framework. The second paper investigates the impact of reasoning strategies on summarization quality and factual faithfulness, finding that explicit reasoning can sometimes degrade factual grounding and that increasing an LLM's internal reasoning budget does not always improve performance. AI