Microsoft has released VibeVoice, an open-source speech-to-text model with built-in speaker diarization. The MIT-licensed model is available for local deployment, meaning audio data does not need to be sent to an API. One user tested the model on a MacBook Pro, transcribing an hour of audio in under nine minutes, though it required significant RAM. AI
IMPACT Provides a self-hostable, open-source alternative for speech-to-text transcription, potentially reducing operational costs for developers.
RANK_REASON Open-source model release from a major company, but not a frontier model release from a top-tier AI lab.
AI-generated summary · Google Gemini · from 6 sources. How we write summaries →