Deepgram
PulseAugur coverage of Deepgram — every cluster mentioning Deepgram across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
Developer builds privacy-first AI app using local audio capture
The developer built a privacy-focused AI application called Plan AI that avoids intrusive meeting bots by capturing system audio locally. This application uses Electron for the desktop interface and a distributed pipeli…
-
Together AI 发布 Voice Finder,支持 600 多种 TTS 语音
Together AI 推出了 Voice Finder,这是一款旨在帮助开发者从 600 多种选项的目录中快速选择最适合其应用程序的声音的新工具。该工具允许用户通过描述所需特征或上传音频样本进行比较来搜索声音。Voice Finder 针对音高、口音和情感等 15 个以上属性对每种声音进行分类,以简化语音代理的选择过程。
-
Curated learning path guides developers in building real-time voice AI agents
A new GitHub repository, "Voice-AI-for-Beginners," offers a structured learning path for developers to build real-time voice AI agents. The guide covers the entire process from initial speech-to-text calls to scaling pr…
-
语音模型在街道名称识别上表现不佳,非母语者尤其如此
Together AI 的研究人员发现,当前最先进的语音识别模型存在显著的失败率,转录街道名称的平均错误率为 39%,特别是对于非英语母语者,他们的信息被误解的可能性高出 18%。这种不准确性可能导致严重的现实后果,例如增加出行时间和网约车等服务的成本。研究表明,一种名为“跨语言风格迁移”的合成数据生成技术,只需极少量的训练数据即可将转录准确率提高高达 60%。
-
Rowboat launches open-source AI coworker that builds knowledge graphs
Rowboat, an open-source AI coworker, has been released, allowing users to create a personal knowledge graph from their work data. This tool connects to email and meeting notes to build a persistent, local knowledge base…
-
Together AI推出统一的实时语音代理平台
Together AI推出了一个统一的平台,用于构建实时语音代理,将语音转文本(STT)、大型语言模型(LLM)和文本转语音(TTS)集成在单一云环境中。这种同地部署旨在将延迟降低到500毫秒以下,并通过消除跨供应商的网络跳转来简化部署。该平台现在原生支持Deepgram的STT和Cartesia Sonic-3的TTS等模型,为开发人员提供了更多选择和更简化的生产就绪语音应用体验。
-
April launches voice AI assistant for email and calendar management
April, a new voice-controlled AI assistant, has launched on the App Store to manage emails and calendars. The application allows users to dictate replies, summarize messages, and reschedule meetings hands-free. It utili…
-
Together AI 集成 Deepgram 语音模型,推出快速 Whisper STT
Together AI 推出了新的语音转文本 (STT) 和文本转语音 (TTS) 功能,集成了 Deepgram 的先进语音模型及其自身高性能的 Whisper V3 API。此举旨在通过提供一个统一的平台来进行转录、LLM 处理和合成,从而简化实时语音代理的开发。这些产品强调速度、准确性和企业级功能,如零数据保留和大型文件处理,解决了当前语音 AI 应用中的关键延迟和质量问题。