OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources
Researchers have introduced OARelatedWork, a new dataset designed for generating related work sections in academic papers. This dataset is unique as it includes full texts of cited papers, moving beyond abstract-only summarization. Initial benchmarks show that even advanced LLMs like GPT-4o-mini struggle with synthesizing information from massive full-text contexts, with performance dropping significantly compared to abstract-only generation. The study also analyzed human writing habits and found that authors often make abstractive claims not directly supported by localized text, leading LLMs to outperform humans in strict factuality. AI
IMPACT Highlights challenges in LLM's ability to synthesize information from extensive full-text documents, potentially guiding future model development for academic writing.