PulseAugur
EN
LIVE 21:37:46
ENTITY Whisper

Whisper

PulseAugur coverage of Whisper — every cluster mentioning Whisper across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
59
59 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
29
29 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-06-09 research_milestone A study on fine-tuning OpenAI's Whisper for Swiss German ASR revealed improved performance and identified benchmark contamination issues. source
  2. 2026-05-12 research_milestone A new semi-supervised framework for speech confidence detection was proposed, achieving a Macro-F1 score of 0.751. source
SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 3/3 · 59 TOTAL
  1. RESEARCH · CL_25987 ·

    AI interpretability advances with Sparse Autoencoders for ASR and functional operators

    Researchers are exploring advanced techniques for interpreting the internal workings of complex AI models. One paper details the application of Sparse Autoencoders (SAEs) to Automatic Speech Recognition (ASR) systems li…

  2. RESEARCH · CL_27585 ·

    LLMs show promise and pitfalls for mental health screening

    Researchers have developed an agentic LLM framework designed for large-scale mental health screening, which uses a policy-guided evaluation system to ensure trustworthiness and adaptability in clinical settings. A separ…

  3. TOOL · CL_22903 ·

    Hermes AI adds free, local voice control for Telegram and Discord

    A guide details how to implement voice control for the Hermes AI assistant, enabling users to interact with it via spoken commands on platforms like Telegram and Discord. The system utilizes local, free models for speec…

  4. TOOL · CL_21319 ·

    Whisper fine-tuning pipeline built for Indian languages

    This article details the process of building a dataset pipeline for fine-tuning OpenAI's Whisper model to better understand Indian languages. It focuses on the technical steps involved in preparing and processing audio …

  5. TOOL · CL_19104 ·

    Hugging Face adds private datasets to ASR leaderboard to prevent benchmaxxing

    Hugging Face has enhanced its Open ASR Leaderboard by incorporating new, high-quality English Automatic Speech Recognition datasets from Appen Inc. and DataoceanAI. To prevent "benchmaxxing" or test-set contamination, t…

  6. RESEARCH · CL_17939 ·

    Mistral AI and X-Voice advance multilingual voice cloning with new architectures

    Researchers have introduced X-Voice, a compact 0.4B parameter model capable of zero-shot cross-lingual voice cloning in 30 languages. The model utilizes a two-stage training process with a unified International Phonetic…

  7. TOOL · CL_15989 ·

    BaldWhisper model achieves 48% size reduction and 2.15x speedup

    Researchers have developed BaldWhisper, a method to significantly compress and accelerate the Whisper speech-to-text model. By employing low-rank decomposition for embeddings and merging transformer layers, BaldWhisper …

  8. RESEARCH · CL_14473 ·

    Audio-language models struggle with dysarthric speech context, but fine-tuning shows promise

    Researchers have developed a benchmark to test if current audio-language models can effectively use additional clinical context to improve automatic speech recognition for dysarthric speech. Initial findings indicate th…

  9. RESEARCH · CL_22854 ·

    Needle model distills Gemini for precise tool-calling tasks

    A new 26-million parameter model named Needle has been developed, distilled from Google's Gemini to excel specifically at tool-calling tasks. The core innovation lies not in its size, but in its ability to reliably prod…

  10. RESEARCH · CL_08610 ·

    Researchers enhance elderly ASR with LLM paraphrasing and speech synthesis

    Researchers have developed a novel data augmentation technique to improve automatic speech recognition (ASR) for elderly individuals. This method utilizes large language models to paraphrase existing transcripts, genera…

  11. RESEARCH · CL_08266 ·

    WhisperPipe architecture slashes ASR latency and memory use for real-time applications

    Researchers have developed WhisperPipe, a new streaming architecture designed to improve real-time automatic speech recognition (ASR) performance. This architecture addresses the trade-off between accuracy and computati…

  12. RESEARCH · CL_06729 ·

    New FADE method improves ASR model quantization for edge devices

    Researchers have developed FADE, a novel framework for improving post-training quantization of encoder-decoder Automatic Speech Recognition (ASR) models. This method addresses the issue of error accumulation across laye…

  13. RESEARCH · CL_13934 ·

    Talkie-1930: New 13B AI model trained on pre-1931 text explores historical knowledge

    A new project called Talkie has released a 13-billion parameter language model trained exclusively on English text from before 1931. This "vintage" model aims to explore AI's ability to predict the future and generate n…

  14. TOOL · CL_47664 ·

    Speech models fail on street names, especially for non-native speakers

    Researchers at Together AI have found that current state-of-the-art speech recognition models exhibit a significant failure rate, averaging 39% error in transcribing street names, particularly for non-native English spe…

  15. TOOL · CL_00804 ·

    Speak leverages OpenAI's AI for personalized language learning and global expansion

    Speak, a language learning application, is leveraging OpenAI's advanced AI capabilities to create a personalized and highly interactive tutoring experience. The company, which began in 2016, has evolved significantly wi…

  16. TOOL · CL_02402 ·

    Morgan Stanley leverages OpenAI's GPT-4 to enhance financial advisor services

    Morgan Stanley has partnered with OpenAI to integrate GPT-4 into its financial advisory services, enhancing advisor efficiency and client engagement. The firm developed an internal chatbot, AI @ Morgan Stanley Assistant…

  17. TOOL · CL_47802 ·

    Replit launches AI templates to speed developer onboarding

    Replit has launched a suite of AI-powered templates designed to streamline developer onboarding and accelerate the creation of AI-driven applications. These templates, available for various programming languages and fra…

  18. FRONTIER RELEASE · CL_01524 ·

    OpenAI launches advanced audio models for API, enhancing voice agents

    OpenAI has released new, advanced audio models through its API, enhancing capabilities for voice agents. The updated speech-to-text models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved accuracy…

  19. TOOL · CL_47938 ·

    Replit integrates OpenAI models for coding assistance and education

    Replit has partnered with OpenAI to integrate advanced AI models into its coding platform. The company is launching a new course on LLMs and GPT, and has introduced beta features powered by OpenAI's Codex model for code…