AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation
Researchers have introduced AudioX-Turbo, a novel framework designed for efficient generation of audio from various multimodal inputs like text, video, and audio signals. The system employs a teacher-student distillation approach, where a high-fidelity teacher model, AudioX-Base, is distilled into a faster student model, AudioX-Turbo. This process significantly reduces the number of sampling steps required for generation, making it approximately 25 times more efficient than existing multi-step baselines. To support this framework, a large dataset named IF-caps-Pro, containing around 9.2 million samples, was also created. AI
IMPACT This framework offers a significant leap in efficiency for multimodal audio generation, potentially enabling broader applications and faster iteration in AI-powered audio creation.