PulseAugur
LIVE 12:23:06
research · [4 sources] ·
0
research

AI drafts boost audio description quality, but quality threshold is key

Researchers have developed methods to improve the quality and scalability of audio description (AD) generation and evaluation. One study introduces GenAD and RefineAD, a pipeline and interface that uses AI-generated drafts to significantly cut down authoring time for AD, provided the drafts meet a certain quality threshold. Another paper proposes a workflow using Item Response Theory to evaluate the proficiency of both human and Vision-Language Model (VLM) raters for AD quality control, finding that top VLMs can approach human rating levels but lack human-like reasoning. A third study highlights the unreliability of zero-shot VLM safety classifiers due to prompt-induced score variance, suggesting prompt-family evaluation with mean aggregation as a standard baseline. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These papers explore improving AI-assisted content creation and evaluation, potentially leading to more accessible digital media and more reliable AI safety assessments.

RANK_REASON The cluster contains multiple academic papers detailing novel research in AI applications and evaluation methodologies.

Read on arXiv cs.AI →

COVERAGE [4]

  1. arXiv cs.AI TIER_1 · Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper, Sanjay Mirani, Gio Jung, Hyunjoo Shim, Zhenzhen Qin, Kien T. Nguyen, Vassilis Athitsos, Ilmi Yoon ·

    Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

    arXiv:2605.05348v1 Announce Type: cross Abstract: Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the ba…

  2. arXiv cs.AI TIER_1 · Lana Do, Gio Jung, Juvenal Francisco Barajas, Andrew Taylor Scott, Shasta Ihorn, Alexander Mario Blum, Vassilis Athitsos, Ilmi Yoon ·

    Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

    arXiv:2602.01390v2 Announce Type: replace-cross Abstract: Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision users are excluded. While crowdsourced platforms and vision-language models (VLMs) expand AD…

  3. arXiv cs.CV TIER_1 · Charles Weng, Dingwen Li, Alexander Martin ·

    Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

    arXiv:2605.00326v1 Announce Type: cross Abstract: Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when th…

  4. arXiv cs.CV TIER_1 · Alexander Martin ·

    Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

    Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when the binary label is constrained to a fixed output po…