ENTITY Whisper

Whisper

PulseAugur coverage of Whisper — every cluster mentioning Whisper across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

59 over 90d

Releases · 30d

0 over 90d

Papers · 30d

29 over 90d

TIER MIX · 90D

frontier release 1
significant 2
research 12
tool 35
commentary 8
meme 1

TOPICS

product 38
other 32
paper 29
model release 16
infra 6
safety 4
funding 2
policy 2

RELATIONSHIPS

developed by OpenAI 100%
invested in Thinking machines 90%
used by Ollama 70%
used by OpenAI MCP 70%
affiliated with GitHub Copilot MCP 70%
used by Figma MCP 70%
used by Thinking machines 50%

TIMELINE

2026-06-09 research_milestone A study on fine-tuning OpenAI's Whisper for Swiss German ASR revealed improved performance and identified benchmark contamination issues. source
2026-05-12 research_milestone A new semi-supervised framework for speech confidence detection was proposed, achieving a Macro-F1 score of 0.751. source

SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 3/3 · 59 TOTAL

RESEARCH · CL_25987 · May 11 · 04:00

AI interpretability advances with Sparse Autoencoders for ASR and functional operators

Researchers are exploring advanced techniques for interpreting the internal workings of complex AI models. One paper details the application of Sparse Autoencoders (SAEs) to Automatic Speech Recognition (ASR) systems li…
RESEARCH · CL_27585 · May 10 · 16:23

LLMs show promise and pitfalls for mental health screening

Researchers have developed an agentic LLM framework designed for large-scale mental health screening, which uses a policy-guided evaluation system to ensure trustworthiness and adaptability in clinical settings. A separ…
TOOL · CL_22903 · May 8 · 11:36

Hermes AI adds free, local voice control for Telegram and Discord

A guide details how to implement voice control for the Hermes AI assistant, enabling users to interact with it via spoken commands on platforms like Telegram and Discord. The system utilizes local, free models for speec…
TOOL · CL_21319 · May 7 · 18:26

Whisper fine-tuning pipeline built for Indian languages

This article details the process of building a dataset pipeline for fine-tuning OpenAI's Whisper model to better understand Indian languages. It focuses on the technical steps involved in preparing and processing audio …
TOOL · CL_19104 · May 6 · 00:00

Hugging Face adds private datasets to ASR leaderboard to prevent benchmaxxing

Hugging Face has enhanced its Open ASR Leaderboard by incorporating new, high-quality English Automatic Speech Recognition datasets from Appen Inc. and DataoceanAI. To prevent "benchmaxxing" or test-set contamination, t…
RESEARCH · CL_17939 · May 5 · 21:11

Mistral AI and X-Voice advance multilingual voice cloning with new architectures

Researchers have introduced X-Voice, a compact 0.4B parameter model capable of zero-shot cross-lingual voice cloning in 30 languages. The model utilizes a two-stage training process with a unified International Phonetic…
TOOL · CL_15989 · May 5 · 04:00

BaldWhisper model achieves 48% size reduction and 2.15x speedup

Researchers have developed BaldWhisper, a method to significantly compress and accelerate the Whisper speech-to-text model. By employing low-rank decomposition for embeddings and merging transformer layers, BaldWhisper …
RESEARCH · CL_14473 · May 4 · 04:00

Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise

Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate th…
RESEARCH · CL_22854 · May 3 · 23:37

Needle model distills Gemini for precise tool-calling tasks

A new 26-million parameter model named Needle has been developed, distilled from Google's Gemini to excel specifically at tool-calling tasks. The core innovation lies not in its size, but in its ability to reliably prod…
RESEARCH · CL_08610 · Apr 29 · 04:00

Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis

Researchers have developed a novel data augmentation technique to improve automatic speech recognition (ASR) for elderly individuals. This method utilizes large language models to paraphrase existing transcripts, genera…
RESEARCH · CL_08266 · Apr 28 · 13:18

WhisperPipe architecture slashes ASR latency and memory use for real-time applications

Researchers have developed WhisperPipe, a new streaming architecture designed to improve real-time automatic speech recognition (ASR) performance. This architecture addresses the trade-off between accuracy and computati…
RESEARCH · CL_06729 · Apr 28 · 04:00

New FADE method improves ASR model quantization for edge devices

Researchers have developed FADE, a novel framework for improving post-training quantization of encoder-decoder Automatic Speech Recognition (ASR) models. This method addresses the issue of error accumulation across laye…
RESEARCH · CL_13934 · Apr 27 · 21:55

Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge

A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…
TOOL · CL_47664 · Feb 23 · 00:00

Speech models fail on street names, especially for non-native speakers

Researchers at Together AI have found that current state-of-the-art speech recognition models exhibit a significant failure rate, averaging 39% error in transcribing street names, particularly for non-native English spe…
TOOL · CL_00804 · Apr 22 · 10:00

Speak leverages OpenAI's AI for personalized language learning and global expansion

Speak, a language learning application, is leveraging OpenAI's advanced AI capabilities to create a personalized and highly interactive tutoring experience. The company, which began in 2016, has evolved significantly wi…
TOOL · CL_02402 · Dec 4 · 10:00

Morgan Stanley leverages OpenAI's GPT-4 to enhance financial advisor services

Morgan Stanley has partnered with OpenAI to integrate GPT-4 into its financial advisory services, enhancing advisor efficiency and client engagement. The firm developed an internal chatbot, AI @ Morgan Stanley Assistant…
TOOL · CL_47802 · Dec 11 · 20:21

Replit launches AI templates to speed developer onboarding

Replit has launched a suite of AI-powered templates designed to streamline developer onboarding and accelerate the creation of AI-driven applications. These templates, available for various programming languages and fra…
FRONTIER RELEASE · CL_01524 · Jul 28 · 00:00

OpenAI launches advanced audio models for API, enhancing voice agents

OpenAI has released new, advanced audio models through its API, enhancing capabilities for voice agents. The updated speech-to-text models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved accuracy…
TOOL · CL_47938 · Jul 29 · 04:00

Replit integrates OpenAI models for coding assistance and education

Replit has partnered with OpenAI to integrate advanced AI models into its coding platform. The company is launching a new course on LLMs and GPT, and has introduced beta features powered by OpenAI's Codex model for code…

AI interpretability advances with Sparse Autoencoders for ASR and functional operators

LLMs show promise and pitfalls for mental health screening

Hermes AI adds free, local voice control for Telegram and Discord

Whisper fine-tuning pipeline built for Indian languages

Hugging Face adds private datasets to ASR leaderboard to prevent benchmaxxing

Mistral AI and X-Voice advance multilingual voice cloning with new architectures

BaldWhisper model achieves 48% size reduction and 2.15x speedup

Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise

Needle model distills Gemini for precise tool-calling tasks

Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis

WhisperPipe architecture slashes ASR latency and memory use for real-time applications

New FADE method improves ASR model quantization for edge devices

Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge

Speech models fail on street names, especially for non-native speakers

Speak leverages OpenAI's AI for personalized language learning and global expansion

Morgan Stanley leverages OpenAI's GPT-4 to enhance financial advisor services

Replit launches AI templates to speed developer onboarding

OpenAI launches advanced audio models for API, enhancing voice agents

Replit integrates OpenAI models for coding assistance and education