LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization
Researchers have developed LibriConvo, a new synthetic conversational speech corpus designed to improve automatic speech recognition (ASR) and speaker diarization systems. The corpus was created by adapting the Speaker-Aware Simulated Conversation framework, processing existing English CallHome data for conversational timing and using LibriTTS utterances grouped by book for semantic continuity. LibriConvo contains over 240 hours of audio featuring 830 speakers, and baseline results show that models like Sortformer and a fine-tuned Fast Conformer-CTC XLarge outperform existing systems on this benchmark. AI
IMPACT Provides a new benchmark for evaluating and improving multi-speaker speech processing systems.