Italiano(IT) Audio-Visual Intelligence in Large Foundation Models

调查总结了大型基础模型中的视听智能研究

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-05 17:59

一篇新的调查论文全面回顾了大型基础模型背景下的视听智能（AVI）。它为AVI任务建立了一个统一的分类法，涵盖了跨音频和视觉模态的理解、生成和交互。该论文综合了方法论基础、数据集、基准和评估指标，旨在为这个快速发展的领域创建一个连贯的框架。 AI

影响整合了视听智能领域的研究，可能加速多模态AI系统的开发。

排序理由这是一篇关于研究主题的调查论文，而不是模型发布或重要的行业事件。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 Italiano(IT) · You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei · 2026-05-06 04:00

Audio-Visual Intelligence in Large Foundation Models

arXiv:2605.04045v1 Announce Type: new Abstract: Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the …
arXiv cs.CV TIER_1 Italiano(IT) · Hao Fei · 2026-05-05 17:59

Audio-Visual Intelligence in Large Foundation Models

Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling o…