ENTITY speech recognition

speech recognition

PulseAugur coverage of speech recognition — every cluster mentioning speech recognition across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

54 over 90d

Releases · 30d

0 over 90d

Papers · 30d

44 over 90d

TIER MIX · 90D

research 25
tool 25
commentary 4

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 1/3 · 54 TOTAL

TOOL · CL_114395 · Jun 28 · 09:11

Whisperian Android app integrates local ASR models

Whisperian is a highly-rated Android application that allows users to leverage local Automatic Speech Recognition (ASR) models with their device's microphone. The app is available for download on the Google Play Store, …
TOOL · CL_111669 · Jun 26 · 04:00

New framework translates spoken Nepali to emotion-conditioned sign language avatars

Researchers have developed NEST-V1, a novel multimodal framework designed for translating spoken Nepali words into emotion-conditioned sign language avatars. This pilot study focuses on four common Nepali words across t…
RESEARCH · CL_109568 · Jun 24 · 06:42

New neural architecture advances phoneme alignment beyond traditional methods

Researchers have developed a novel, fully differentiable neural architecture for phoneme alignment, aiming to advance the field beyond traditional HMM-GMM frameworks. This end-to-end system features an encoder for signa…
RESEARCH · CL_109571 · Jun 24 · 06:15

Speech LLMs enhanced by translation-based encoder pre-training

A new research paper proposes using speech translation to bridge the gap between speech encoders and large language models (LLMs) in Speech LLMs. The paper argues that current architectures have a structural misalignmen…
TOOL · CL_109048 · Jun 24 · 00:00

Hugging Face launches FFASR Leaderboard for real-world ASR benchmarking

Hugging Face and Treble Technologies have launched the FFASR Leaderboard, an open, community-driven benchmark for evaluating Automatic Speech Recognition (ASR) models in realistic far-field acoustic conditions. This new…
RESEARCH · CL_107788 · Jun 23 · 15:34

New benchmark evaluates Chinese news TTS pronunciation accuracy

Researchers have introduced the CN-NewsTTS Bench, a new benchmark designed to evaluate the pronunciation accuracy of Chinese news Text-to-Speech (TTS) systems. This benchmark specifically targets complex written forms l…
RESEARCH · CL_107850 · Jun 23 · 03:04

NeuroSonic framework reconstructs speech from EEG signals

Researchers have developed NeuroSonic, a new framework for reconstructing speech from electroencephalography (EEG) signals. This method utilizes conditional flow matching to learn a deterministic velocity field that tra…
RESEARCH · CL_109476 · Jun 23 · 00:00

Wan-Streamer v0.1: Unified model for real-time audio-visual interaction

Researchers have introduced Wan-Streamer v0.1, a novel end-to-end multimodal foundation model designed for real-time, low-latency audio-visual interaction. Unlike traditional cascaded systems, Wan-Streamer integrates la…
RESEARCH · CL_107825 · Jun 22 · 21:19

Speech models encode African American English consonant cluster reduction

Researchers have investigated how speech models like wav2vec 2.0 and Whisper represent consonant cluster reduction (CCR) in African American English (AAE). The study found that both models can accurately distinguish bet…
RESEARCH · CL_106008 · Jun 19 · 16:43

New ASR techniques tackle phonetic errors and judge reliability

Researchers are developing advanced methods to improve Automatic Speech Recognition (ASR) systems, particularly for low-resource languages and to address specific types of errors. One approach, Error-Aware TF-IDF, uses …
COMMENTARY · CL_98974 · Jun 18 · 15:29

Conversational AI benchmarks fail to capture real-world user experience

Current benchmark metrics for conversational AI systems often fail to capture the true quality of multi-turn interactions. Issues like accumulated timing mistakes, repetitive confirmations, and unnatural turn-taking can…
RESEARCH · CL_98162 · Jun 18 · 04:00

New research tackles ASR challenges with synthetic speech, LLM optimization, and failure reduction

Researchers are developing advanced techniques to improve Automatic Speech Recognition (ASR) systems, particularly for challenging scenarios like code-switching and real-time applications. One paper proposes a code-mixi…
TOOL · CL_98011 · Jun 18 · 04:00

New simulator automates air traffic controller training with adapted speech models

Researchers have developed ASTRA, a new simulator designed to train Air Traffic Control Operators (ATCOs) by automating the role of human simpilots. This system addresses the limitations of existing Western-centric spee…
RESEARCH · CL_98093 · Jun 17 · 01:23

New AI models tackle Chinese dialect discrimination using speech and transfer learning · 4 sources tracked

Two new research papers propose advanced methods for distinguishing between Chinese dialects, a task traditionally challenging due to limited text data. One paper introduces a speech-driven approach using Mel Frequency …
RESEARCH · CL_95851 · Jun 16 · 12:03

wav2vec2.0 architecture shows limited compensation for tonal context

A new study published on arXiv investigates the wav2vec2.0 architecture's ability to compensate for phonological context in Mandarin Chinese tones. Researchers found no evidence of compensation in the purely self-superv…
RESEARCH · CL_95860 · Jun 16 · 11:48

Bilingual Fine-Tuning Enhances Low-Resource Speech Recognition

Researchers have developed a method for improving automatic speech recognition (ASR) in low-resource languages through bilingual fine-tuning. The study evaluated this technique across nine diverse language pairs, using …
RESEARCH · CL_95869 · Jun 16 · 05:28

New NAR-MBR Decoding Boosts Speech Recognition Speed and Accuracy

Researchers have developed a new non-autoregressive decoding framework for speech recognition, termed NAR-MBR decoding. This method aims to improve the speed of speech recognition by generating output tokens in parallel…
TOOL · CL_93544 · Jun 16 · 04:00

G2P supervision yields diminishing returns for phonetic transcription

A new research paper explores the effectiveness of Grapheme-to-Phoneme (G2P) models for generating phonetic transcriptions at scale. The study found that G2P supervision is beneficial only when less than 20-30 hours of …
TOOL · CL_93542 · Jun 16 · 04:00

New Romanian speech corpus tackles demographic bias in parliamentary ASR

Researchers have developed a new dataset and framework for improving Romanian-accented speech recognition, specifically for parliamentary proceedings. The ROManian PARliamentary Speech Corpus (ROMPAR) includes 17.80 hou…
COMMENTARY · CL_91316 · Jun 15 · 06:58

AI Speech Technologies: A Resource Compilation

This Mastodon post compiles resources on AI speech technologies, covering Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, and voice cloning. The collection aims to provide notes and links for those interest…

Whisperian Android app integrates local ASR models

New framework translates spoken Nepali to emotion-conditioned sign language avatars

New neural architecture advances phoneme alignment beyond traditional methods

Speech LLMs enhanced by translation-based encoder pre-training

Hugging Face launches FFASR Leaderboard for real-world ASR benchmarking

New benchmark evaluates Chinese news TTS pronunciation accuracy

NeuroSonic framework reconstructs speech from EEG signals

Wan-Streamer v0.1: Unified model for real-time audio-visual interaction

Speech models encode African American English consonant cluster reduction

New ASR techniques tackle phonetic errors and judge reliability

Conversational AI benchmarks fail to capture real-world user experience

New research tackles ASR challenges with synthetic speech, LLM optimization, and failure reduction

New simulator automates air traffic controller training with adapted speech models

New AI models tackle Chinese dialect discrimination using speech and transfer learning · 4 sources tracked

wav2vec2.0 architecture shows limited compensation for tonal context

Bilingual Fine-Tuning Enhances Low-Resource Speech Recognition

New NAR-MBR Decoding Boosts Speech Recognition Speed and Accuracy

G2P supervision yields diminishing returns for phonetic transcription

New Romanian speech corpus tackles demographic bias in parliamentary ASR

AI Speech Technologies: A Resource Compilation