Researchers have introduced Raon-Speech, a 9-billion parameter speech language model capable of understanding, answering, and generating speech in English and Korean. This model, trained on over 1.38 million hours of curated speech and text data, outperforms similarly sized audio foundation models on speech-centric tasks while maintaining strong text-based question-answering abilities. An extension, Raon-SpeechChat, further enhances real-time, full-duplex conversation capabilities through additional training on dialogue data, demonstrating strengths in turn-taking and interruption sensitivity. AI
IMPACT This new speech language model sets a new benchmark for speech understanding and generation, potentially improving human-computer interaction and real-time conversational AI.
RANK_REASON The cluster contains an arXiv paper detailing a new speech language model. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →