PulseAugur
EN
LIVE 15:32:49
ENTITY speech recognition

speech recognition

PulseAugur coverage of speech recognition — every cluster mentioning speech recognition across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
54
54 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
44
44 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 1/3 · 54 TOTAL
  1. TOOL · CL_114395 ·

    Whisperian Android app integrates local ASR models

    Whisperian is a highly-rated Android application that allows users to leverage local Automatic Speech Recognition (ASR) models with their device's microphone. The app is available for download on the Google Play Store, …

  2. TOOL · CL_111669 ·

    New framework translates spoken Nepali to emotion-conditioned sign language avatars

    Researchers have developed NEST-V1, a novel multimodal framework designed for translating spoken Nepali words into emotion-conditioned sign language avatars. This pilot study focuses on four common Nepali words across t…

  3. RESEARCH · CL_109568 ·

    New neural architecture advances phoneme alignment beyond traditional methods

    Researchers have developed a novel, fully differentiable neural architecture for phoneme alignment, aiming to advance the field beyond traditional HMM-GMM frameworks. This end-to-end system features an encoder for signa…

  4. RESEARCH · CL_109571 ·

    Speech LLMs enhanced by translation-based encoder pre-training

    A new research paper proposes using speech translation to bridge the gap between speech encoders and large language models (LLMs) in Speech LLMs. The paper argues that current architectures have a structural misalignmen…

  5. TOOL · CL_109048 ·

    Hugging Face launches FFASR Leaderboard for real-world ASR benchmarking

    Hugging Face and Treble Technologies have launched the FFASR Leaderboard, an open, community-driven benchmark for evaluating Automatic Speech Recognition (ASR) models in realistic far-field acoustic conditions. This new…

  6. RESEARCH · CL_107788 ·

    New benchmark evaluates Chinese news TTS pronunciation accuracy

    Researchers have introduced the CN-NewsTTS Bench, a new benchmark designed to evaluate the pronunciation accuracy of Chinese news Text-to-Speech (TTS) systems. This benchmark specifically targets complex written forms l…

  7. RESEARCH · CL_107850 ·

    NeuroSonic framework reconstructs speech from EEG signals

    Researchers have developed NeuroSonic, a new framework for reconstructing speech from electroencephalography (EEG) signals. This method utilizes conditional flow matching to learn a deterministic velocity field that tra…

  8. RESEARCH · CL_109476 ·

    Wan-Streamer v0.1: Unified model for real-time audio-visual interaction

    Researchers have introduced Wan-Streamer v0.1, a novel end-to-end multimodal foundation model designed for real-time, low-latency audio-visual interaction. Unlike traditional cascaded systems, Wan-Streamer integrates la…

  9. RESEARCH · CL_107825 ·

    Speech models encode African American English consonant cluster reduction

    Researchers have investigated how speech models like wav2vec 2.0 and Whisper represent consonant cluster reduction (CCR) in African American English (AAE). The study found that both models can accurately distinguish bet…

  10. RESEARCH · CL_106008 ·

    New ASR techniques tackle phonetic errors and judge reliability

    Researchers are developing advanced methods to improve Automatic Speech Recognition (ASR) systems, particularly for low-resource languages and to address specific types of errors. One approach, Error-Aware TF-IDF, uses …

  11. COMMENTARY · CL_98974 ·

    Conversational AI benchmarks fail to capture real-world user experience

    Current benchmark metrics for conversational AI systems often fail to capture the true quality of multi-turn interactions. Issues like accumulated timing mistakes, repetitive confirmations, and unnatural turn-taking can…

  12. RESEARCH · CL_98162 ·

    New research tackles ASR challenges with synthetic speech, LLM optimization, and failure reduction

    Researchers are developing advanced techniques to improve Automatic Speech Recognition (ASR) systems, particularly for challenging scenarios like code-switching and real-time applications. One paper proposes a code-mixi…

  13. TOOL · CL_98011 ·

    New simulator automates air traffic controller training with adapted speech models

    Researchers have developed ASTRA, a new simulator designed to train Air Traffic Control Operators (ATCOs) by automating the role of human simpilots. This system addresses the limitations of existing Western-centric spee…

  14. RESEARCH · CL_98093 ·

    New AI models tackle Chinese dialect discrimination using speech and transfer learning · 4 sources tracked

    Two new research papers propose advanced methods for distinguishing between Chinese dialects, a task traditionally challenging due to limited text data. One paper introduces a speech-driven approach using Mel Frequency …

  15. RESEARCH · CL_95851 ·

    wav2vec2.0 architecture shows limited compensation for tonal context

    A new study published on arXiv investigates the wav2vec2.0 architecture's ability to compensate for phonological context in Mandarin Chinese tones. Researchers found no evidence of compensation in the purely self-superv…

  16. RESEARCH · CL_95860 ·

    Bilingual Fine-Tuning Enhances Low-Resource Speech Recognition

    Researchers have developed a method for improving automatic speech recognition (ASR) in low-resource languages through bilingual fine-tuning. The study evaluated this technique across nine diverse language pairs, using …

  17. RESEARCH · CL_95869 ·

    New NAR-MBR Decoding Boosts Speech Recognition Speed and Accuracy

    Researchers have developed a new non-autoregressive decoding framework for speech recognition, termed NAR-MBR decoding. This method aims to improve the speed of speech recognition by generating output tokens in parallel…

  18. TOOL · CL_93544 ·

    G2P supervision yields diminishing returns for phonetic transcription

    A new research paper explores the effectiveness of Grapheme-to-Phoneme (G2P) models for generating phonetic transcriptions at scale. The study found that G2P supervision is beneficial only when less than 20-30 hours of …

  19. TOOL · CL_93542 ·

    New Romanian speech corpus tackles demographic bias in parliamentary ASR

    Researchers have developed a new dataset and framework for improving Romanian-accented speech recognition, specifically for parliamentary proceedings. The ROManian PARliamentary Speech Corpus (ROMPAR) includes 17.80 hou…

  20. COMMENTARY · CL_91316 ·

    AI Speech Technologies: A Resource Compilation

    This Mastodon post compiles resources on AI speech technologies, covering Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, and voice cloning. The collection aims to provide notes and links for those interest…