Two new arXiv papers explore the effectiveness of Large Language Models (LLMs) for abstractive summarization. The first paper introduces OmniCSEval, a comprehensive benchmark designed to evaluate LLMs across diverse scenarios, context lengths, and reasoning capabilities, using a novel fact-checking framework. The second paper investigates the impact of reasoning strategies on summarization quality and factual faithfulness, finding that explicit reasoning can sometimes degrade factual grounding and that increasing an LLM's internal reasoning budget does not always improve performance. AI
RANK_REASON Two academic papers published on arXiv detailing new benchmarks and findings related to LLM capabilities.
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- Gotit.pub
- Haohan Yuan
- Hugging Face
- Large Reasoning Models
- LLMs
- OmniCSEval
- ScienceCast
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →