PulseAugur
EN
LIVE 08:33:57

New X+Slides benchmark evaluates LLMs for audience-conditioned slide generation

Researchers have introduced X+Slides, a new benchmark designed to evaluate the audience-conditioning capabilities of large language models in generating slide decks. Unlike previous benchmarks that focused on completeness and technical depth, X+Slides incorporates audience-specific needs, such as specialists requiring proofs and decision-makers seeking conclusions. The benchmark utilizes a dynamic evaluation framework with 8,133 probes across 113 topics and seven presentation scenes, reporting metrics like Audience Coverage, Domain-wise Coverage, Efficiency, and Correctness. Initial experiments on models like DeepPresenter and NotebookLM indicate that current systems can convey a significant portion of audience-essential information but still have room for improvement. AI

IMPACT This benchmark could drive improvements in LLM-generated content by focusing on audience adaptation, leading to more effective communication tools.

RANK_REASON The cluster contains a research paper detailing a new benchmark for evaluating LLM capabilities.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Haodong Chen, Xuanhe Zhou, Wei Zhou, Xinyue Shao, Yanbing Zhu, Bo Wang, Jiawei Hong, Anya Jia, Fan Wu ·

    X+Slides: Benchmarking Audience-Conditioned Slide Generation

    arXiv:2606.19256v1 Announce Type: new Abstract: Automatically generating slide decks from source documents is an important application of large language models (LLMs). Existing benchmarks primarily assess slide completeness and technical depth, while overlooking the target audien…

  2. arXiv cs.AI TIER_1 English(EN) · Fan Wu ·

    X+Slides: Benchmarking Audience-Conditioned Slide Generation

    Automatically generating slide decks from source documents is an important application of large language models (LLMs). Existing benchmarks primarily assess slide completeness and technical depth, while overlooking the target audience as a critical real-world factor. For instance…