Researchers have introduced Raon-Speech, a 9-billion parameter speech language model capable of understanding, answering, and generating speech in English and Korean. This model, trained on over 1.38 million hours of curated speech and text data, outperforms similarly sized audio foundation models on speech-centric tasks while maintaining strong text-based question-answering abilities. An extension, Raon-SpeechChat, further enhances real-time, full-duplex conversation capabilities through additional training on dialogue data, demonstrating strengths in turn-taking and interruption sensitivity. AI
影响 This new speech language model sets a new benchmark for speech understanding and generation, potentially improving human-computer interaction and real-time conversational AI.
排序理由 The cluster contains an arXiv paper detailing a new speech language model. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →