English(EN) Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

AI草稿提升音频描述质量，但质量阈值是关键

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-01 01:06

研究人员开发了改进音频描述（AD）生成和评估的质量与可扩展性的方法。一项研究介绍了GenAD和RefineAD，这是一个利用AI生成的草稿来显著缩短AD创作时间的流程和界面，前提是草稿达到一定的质量阈值。另一篇论文提出了一种使用项目反应理论来评估人类和视觉语言模型（VLM）评分者在AD质量控制方面的熟练程度的工作流程，发现顶级的VLM可以接近人类评分水平，但缺乏类似人类的推理能力。第三项研究强调了零样本VLM安全分类器由于提示引起的得分差异而不可靠，建议将提示族评估与平均聚合作为标准基线。 AI

影响这些论文探讨了改进AI辅助的内容创作和评估，可能带来更易于访问的数字媒体和更可靠的AI安全评估。

排序理由该集群包含多篇详细介绍AI应用和评估方法新研究的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.AI TIER_1 English(EN) · Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper, Sanjay Mirani, Gio Jung, Hyunjoo Shim, Zhenzhen Qin, Kien T. Nguyen, Vassilis Athitsos, Ilmi Yoon · 2026-05-08 04:00

Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

arXiv:2605.05348v1 Announce Type: cross Abstract: Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the ba…
arXiv cs.AI TIER_1 English(EN) · Lana Do, Gio Jung, Juvenal Francisco Barajas, Andrew Taylor Scott, Shasta Ihorn, Alexander Mario Blum, Vassilis Athitsos, Ilmi Yoon · 2026-05-08 04:00

Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

arXiv:2602.01390v2 Announce Type: replace-cross Abstract: Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision users are excluded. While crowdsourced platforms and vision-language models (VLMs) expand AD…
arXiv cs.CV TIER_1 English(EN) · Charles Weng, Dingwen Li, Alexander Martin · 2026-05-04 04:00

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

arXiv:2605.00326v1 Announce Type: cross Abstract: Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when th…
arXiv cs.CV TIER_1 English(EN) · Alexander Martin · 2026-05-01 01:06

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

Single-prompt first-token probabilities from zero-shot vision-language model (VLM) safety classifiers are treated as decision scores, but we show they are unreliable under semantically equivalent prompt reformulation: even when the binary label is constrained to a fixed output po…

报道来源 [4]

Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification

相关实体

相关话题