AI drafts boost audio description quality, but quality threshold is key

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 4 sources

Researchers have developed methods to improve the quality and scalability of audio description (AD) generation and evaluation. One study introduces GenAD and RefineAD, a pipeline and interface that uses AI-generated drafts to significantly cut down authoring time for AD, provided the drafts meet a certain quality threshold. Another paper proposes a workflow using Item Response Theory to evaluate the proficiency of both human and Vision-Language Model (VLM) raters for AD quality control, finding that top VLMs can approach human rating levels but lack human-like reasoning. A third study highlights the unreliability of zero-shot VLM safety classifiers due to prompt-induced score variance, suggesting prompt-family evaluation with mean aggregation as a standard baseline. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These papers explore improving AI-assisted content creation and evaluation, potentially leading to more accessible digital media and more reliable AI safety assessments.

RANK_REASON The cluster contains multiple academic papers detailing novel research in AI applications and evaluation methodologies.

Read on arXiv cs.AI →

COVERAGE [4]

arXiv cs.AI TIER_1 · Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper, Sanjay Mirani, Gio Jung, Hyunjoo Shim, Zhenzhen Qin, Kien T. Nguyen, Vassilis Athitsos, Ilmi Yoon · 2026-05-08 04:00

Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

arXiv:2605.05348v1 Announce Type: cross Abstract: Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the ba…
arXiv cs.AI TIER_1 · Lana Do, Gio Jung, Juvenal Francisco Barajas, Andrew Taylor Scott, Shasta Ihorn, Alexander Mario Blum, Vassilis Athitsos, Ilmi Yoon · 2026-05-08 04:00

Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

arXiv:2602.01390v2 Announce Type: replace-cross Abstract: Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision users are excluded. While crowdsourced platforms and vision-language models (VLMs) expand AD…
arXiv cs.CV TIER_1 · Charles Weng, Dingwen Li, Alexander Martin · 2026-05-04 04:00

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

arXiv:2605.00326v1 Announce Type: cross Abstract: Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when th…
arXiv cs.CV TIER_1 · Alexander Martin · 2026-05-01 01:06

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when the binary label is constrained to a fixed output po…

COVERAGE [4]

Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

RELATED ENTITIES

RELATED TOPICS