PulseAugur
实时 11:09:41

Apple researchers develop StereoFoley for object-aware stereo audio generation from video

Apple researchers have developed StereoFoley, a new framework for generating stereo audio from video that is semantically aligned, temporally synchronized, and spatially accurate. The system addresses limitations in existing models by creating object-aware stereo imaging, overcoming the lack of suitable datasets through a synthetic data generation pipeline. This pipeline combines video analysis, object tracking, and audio synthesis with dynamic panning and distance controls to produce realistic soundscapes, setting a new benchmark for video-to-audio generation. AI

影响 Sets a new benchmark for generating spatially accurate stereo audio from video content.

排序理由 This is a research paper detailing a new framework for audio generation from video.

在 Apple Machine Learning Research 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Apple researchers develop StereoFoley for object-aware stereo audio generation from video

报道来源 [1]

  1. Apple Machine Learning Research TIER_1 English(EN) ·

    StereoFoley: Object-Aware Stereo Audio Generation from Video

    We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely rema…