English(EN) SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

新的SSNAPS方法使用扩散技术进行视听语音分离

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员开发了SSNAPS，一种利用视听线索将语音与背景噪声分离的新型无监督方法。该方法采用扩散逆采样，通过对干净语音和环境噪声建模不同的扩散先验来重建所有声源。与有监督基线相比，该技术在各种嘈杂条件下的词错误率方面表现更优，甚至可以处理多说话人和屏幕外分离。分离出的噪声成分的高保真度也支持下游声景检测。 AI

排序理由该集群包含一篇详细介绍新的视听语音分离方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Yochai Yemini, Yoav Ellinson, Rami Ben-Ari, Sharon Gannot, Ethan Fetaya · 2026-06-16 04:00

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

arXiv:2602.01394v2 Announce Type: replace-cross Abstract: This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model…

报道来源 [1]

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

相关实体

相关话题