Whisper
PulseAugur coverage of Whisper — every cluster mentioning Whisper across labs, papers, and developer communities, ranked by signal.
- 2026-05-12 research_milestone A new semi-supervised framework for speech confidence detection was proposed, achieving a Macro-F1 score of 0.751. 来源
11 天有情绪数据
-
BaldWhisper model achieves 48% size reduction and 2.15x speedup
Researchers have developed BaldWhisper, a method to significantly compress and accelerate the Whisper speech-to-text model. By employing low-rank decomposition for embeddings and merging transformer layers, BaldWhisper …
-
Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise
Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate th…
-
Needle model distills Gemini for precise tool-calling tasks
A new 26-million parameter model named Needle has been developed, distilled from Google's Gemini to excel specifically at tool-calling tasks. The core innovation lies not in its size, but in its ability to reliably prod…
-
Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis
Researchers have developed a novel data augmentation technique to improve automatic speech recognition (ASR) for elderly individuals. This method utilizes large language models to paraphrase existing transcripts, genera…
-
WhisperPipe architecture slashes ASR latency and memory use for real-time applications
Researchers have developed WhisperPipe, a new streaming architecture designed to improve real-time automatic speech recognition (ASR) performance. This architecture addresses the trade-off between accuracy and computati…
-
New FADE method improves ASR model quantization for edge devices
Researchers have developed FADE, a novel framework for improving post-training quantization of encoder-decoder Automatic Speech Recognition (ASR) models. This method addresses the issue of error accumulation across laye…
-
Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge
A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…
-
语音模型在街道名称识别上表现不佳,非母语者尤其如此
Together AI 的研究人员发现,当前最先进的语音识别模型存在显著的失败率,转录街道名称的平均错误率为 39%,特别是对于非英语母语者,他们的信息被误解的可能性高出 18%。这种不准确性可能导致严重的现实后果,例如增加出行时间和网约车等服务的成本。研究表明,一种名为“跨语言风格迁移”的合成数据生成技术,只需极少量的训练数据即可将转录准确率提高高达 60%。
-
Speak 利用 OpenAI 的人工智能进行个性化语言学习和全球扩张
语言学习应用程序 Speak 正在利用 OpenAI 的先进人工智能能力,创造个性化且高度互动的一对一辅导体验。该公司成立于 2016 年,随着语音识别和大型语言模型的进步而显著发展,实现了实时反馈和对话角色扮演等功能。Speak 的战略是首先专注于韩国市场以验证其人工智能原生模型,然后再进行全球扩张,目前该公司正在投资人工智能生成的课程,以实现跨不同领域的个性化学习路径。
-
Morgan Stanley leverages OpenAI's GPT-4 to enhance financial advisor services
Morgan Stanley has partnered with OpenAI to integrate GPT-4 into its financial advisory services, enhancing advisor efficiency and client engagement. The firm developed an internal chatbot, AI @ Morgan Stanley Assistant…
-
Replit推出AI模板以加快开发者入职
Replit推出了一套由AI驱动的模板,旨在简化开发者的入职流程并加速AI驱动型应用程序的创建。这些模板支持多种编程语言和框架,简化了向量数据库和大型语言模型等工具的复杂设置。值得注意的示例包括用于Qdrant向量搜索、比较Gemini和GPT-4、使用OpenAI构建AI支持代理以及使用OpenAI Whisper进行会议转录的模板。
-
OpenAI launches advanced audio models for API, enhancing voice agents
OpenAI has released new, advanced audio models through its API, enhancing capabilities for voice agents. The updated speech-to-text models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved accuracy…
-
Replit 集成 OpenAI 模型以提供编码辅助和教育
Replit 已与 OpenAI 合作,将其先进的 AI 模型集成到其编码平台中。该公司正在推出一门关于 LLM 和 GPT 的新课程,并推出了由 OpenAI 的 Codex 模型驱动的代码解释 beta 功能。此外,Replit 还在探索使用 GPT-3 生成博客内容,这凸显了 AI 与软件开发环境之间日益增长的协同作用。