Whisper
PulseAugur coverage of Whisper — every cluster mentioning Whisper across labs, papers, and developer communities, ranked by signal.
- 2026-06-09 research_milestone A study on fine-tuning OpenAI's Whisper for Swiss German ASR revealed improved performance and identified benchmark contamination issues. source
- 2026-05-12 research_milestone A new semi-supervised framework for speech confidence detection was proposed, achieving a Macro-F1 score of 0.751. source
19 day(s) with sentiment data
-
AI interpretability advances with Sparse Autoencoders for ASR and functional operators
Researchers are exploring advanced techniques for interpreting the internal workings of complex AI models. One paper details the application of Sparse Autoencoders (SAEs) to Automatic Speech Recognition (ASR) systems li…
-
LLMs show promise and pitfalls for mental health screening
Researchers have developed an agentic LLM framework designed for large-scale mental health screening, which uses a policy-guided evaluation system to ensure trustworthiness and adaptability in clinical settings. A separ…
-
Hermes AI adds free, local voice control for Telegram and Discord
A guide details how to implement voice control for the Hermes AI assistant, enabling users to interact with it via spoken commands on platforms like Telegram and Discord. The system utilizes local, free models for speec…
-
Whisper fine-tuning pipeline built for Indian languages
This article details the process of building a dataset pipeline for fine-tuning OpenAI's Whisper model to better understand Indian languages. It focuses on the technical steps involved in preparing and processing audio …
-
Hugging Face adds private datasets to ASR leaderboard to prevent benchmaxxing
Hugging Face has enhanced its Open ASR Leaderboard by incorporating new, high-quality English Automatic Speech Recognition datasets from Appen Inc. and DataoceanAI. To prevent "benchmaxxing" or test-set contamination, t…
-
Mistral AI and X-Voice advance multilingual voice cloning with new architectures
Researchers have introduced X-Voice, a compact 0.4B parameter model capable of zero-shot cross-lingual voice cloning in 30 languages. The model utilizes a two-stage training process with a unified International Phonetic…
-
BaldWhisper model achieves 48% size reduction and 2.15x speedup
Researchers have developed BaldWhisper, a method to significantly compress and accelerate the Whisper speech-to-text model. By employing low-rank decomposition for embeddings and merging transformer layers, BaldWhisper …
-
Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise
Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate th…
-
Needle model distills Gemini for precise tool-calling tasks
A new 26-million parameter model named Needle has been developed, distilled from Google's Gemini to excel specifically at tool-calling tasks. The core innovation lies not in its size, but in its ability to reliably prod…
-
Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis
Researchers have developed a novel data augmentation technique to improve automatic speech recognition (ASR) for elderly individuals. This method utilizes large language models to paraphrase existing transcripts, genera…
-
WhisperPipe architecture slashes ASR latency and memory use for real-time applications
Researchers have developed WhisperPipe, a new streaming architecture designed to improve real-time automatic speech recognition (ASR) performance. This architecture addresses the trade-off between accuracy and computati…
-
New FADE method improves ASR model quantization for edge devices
Researchers have developed FADE, a novel framework for improving post-training quantization of encoder-decoder Automatic Speech Recognition (ASR) models. This method addresses the issue of error accumulation across laye…
-
Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge
A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…
-
Speech models fail on street names, especially for non-native speakers
Researchers at Together AI have found that current state-of-the-art speech recognition models exhibit a significant failure rate, averaging 39% error in transcribing street names, particularly for non-native English spe…
-
Speak leverages OpenAI's AI for personalized language learning and global expansion
Speak, a language learning application, is leveraging OpenAI's advanced AI capabilities to create a personalized and highly interactive tutoring experience. The company, which began in 2016, has evolved significantly wi…
-
Morgan Stanley leverages OpenAI's GPT-4 to enhance financial advisor services
Morgan Stanley has partnered with OpenAI to integrate GPT-4 into its financial advisory services, enhancing advisor efficiency and client engagement. The firm developed an internal chatbot, AI @ Morgan Stanley Assistant…
-
Replit launches AI templates to speed developer onboarding
Replit has launched a suite of AI-powered templates designed to streamline developer onboarding and accelerate the creation of AI-driven applications. These templates, available for various programming languages and fra…
-
OpenAI launches advanced audio models for API, enhancing voice agents
OpenAI has released new, advanced audio models through its API, enhancing capabilities for voice agents. The updated speech-to-text models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved accuracy…
-
Replit integrates OpenAI models for coding assistance and education
Replit has partnered with OpenAI to integrate advanced AI models into its coding platform. The company is launching a new course on LLMs and GPT, and has introduced beta features powered by OpenAI's Codex model for code…