Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 8h · [2 sources]

A Large-Scale Multi-Dimensional Empirical Study of LLMs for Conversation Summarization

Two new arXiv papers explore the effectiveness of Large Language Models (LLMs) for abstractive summarization. The first paper introduces OmniCSEval, a comprehensive benchmark designed to evaluate LLMs across diverse scenarios, context lengths, and reasoning capabilities, using a novel fact-checking framework. The second paper investigates the impact of reasoning strategies on summarization quality and factual faithfulness, finding that explicit reasoning can sometimes degrade factual grounding and that increasing an LLM's internal reasoning budget does not always improve performance. AI

Hugging Face
LLMs
arXiv
DagsHub
Large Reasoning Models
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
OmniCSEval
Haohan Yuan