PulseAugur
LIVE 09:14:00
research · [1 source] ·
0
research

Apple researchers develop StereoFoley for object-aware stereo audio generation from video

Apple researchers have developed StereoFoley, a new framework for generating stereo audio from video that is semantically aligned, temporally synchronized, and spatially accurate. The system addresses limitations in existing models by creating object-aware stereo imaging, overcoming the lack of suitable datasets through a synthetic data generation pipeline. This pipeline combines video analysis, object tracking, and audio synthesis with dynamic panning and distance controls to produce realistic soundscapes, setting a new benchmark for video-to-audio generation. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Sets a new benchmark for generating spatially accurate stereo audio from video content.

RANK_REASON This is a research paper detailing a new framework for audio generation from video.

Read on Apple Machine Learning Research →

COVERAGE [1]

  1. Apple Machine Learning Research TIER_1 ·

    StereoFoley: Object-Aware Stereo Audio Generation from Video

    We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely rema…