PulseAugur
EN
LIVE 13:29:10

New dataset challenges LLMs on full-text related work generation

Researchers have introduced OARelatedWork, a new dataset designed for generating related work sections in academic papers. This dataset is unique as it includes full texts of cited papers, moving beyond abstract-only summarization. Initial benchmarks show that even advanced LLMs like GPT-4o-mini struggle with synthesizing information from massive full-text contexts, with performance dropping significantly compared to abstract-only generation. The study also analyzed human writing habits and found that authors often make abstractive claims not directly supported by localized text, leading LLMs to outperform humans in strict factuality. AI

IMPACT Highlights challenges in LLM's ability to synthesize information from extensive full-text documents, potentially guiding future model development for academic writing.

RANK_REASON The cluster describes a new academic dataset and associated research paper, including model benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Martin Docekal, Martin Fajcik, Pavel Smrz ·

    OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources

    arXiv:2405.01930v2 Announce Type: replace Abstract: This paper introduces OARelatedWork: a dataset for related work generation from open-access sources. It is the first large-scale multi-document summarization dataset for related work generation, containing whole related work sec…