Microsoft has released VibeVoice, an open-source speech-to-text model with built-in speaker diarization. The MIT-licensed model is available for local deployment, meaning audio data does not need to be sent to an API. One user tested the model on a MacBook Pro, transcribing an hour of audio in under nine minutes, though it required significant RAM. AI
影响 Provides a self-hostable, open-source alternative for speech-to-text transcription, potentially reducing operational costs for developers.
排序理由 Open-source model release from a major company, but not a frontier model release from a top-tier AI lab.
AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →