Researchers have developed LibriConvo, a new synthetic conversational speech corpus designed to improve automatic speech recognition (ASR) and speaker diarization systems. The corpus was created by adapting the Speaker-Aware Simulated Conversation framework, processing existing English CallHome data for conversational timing and using LibriTTS utterances grouped by book for semantic continuity. LibriConvo contains over 240 hours of audio featuring 830 speakers, and baseline results show that models like Sortformer and a fine-tuned Fast Conformer-CTC XLarge outperform existing systems on this benchmark. AI
IMPACT Provides a new benchmark for evaluating and improving multi-speaker speech processing systems.
RANK_REASON The cluster contains a research paper detailing a new synthetic dataset and benchmark for speech processing tasks. [lever_c_demoted from research: ic=1 ai=1.0]
- English CallHome
- Fast Conformer-CTC XLarge
- LibriConvo
- LibriTTS
- pyannote
- Serialized Output Training
- Sortformer
- Speaker-Aware Simulated Conversation
- Whisper-large-v3
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →