ENTITY speech synthesis

speech synthesis

PulseAugur coverage of speech synthesis — every cluster mentioning speech synthesis across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

18 over 90d

Releases · 30d

0 over 90d

Papers · 30d

15 over 90d

TIER MIX · 90D

research 12
tool 5
commentary 1

TOPICS

SENTIMENT · 30D

11 day(s) with sentiment data

RECENT · PAGE 1/1 · 18 TOTAL

RESEARCH · CL_109575 · Jun 24 · 03:57

New Japanese TTS system tackles kanji polyphony with massive data scaling

Researchers have developed Sarashina2.2-TTS, a novel text-to-speech system specifically designed for Japanese, addressing the challenge of kanji polyphony. The system utilizes a massive dataset of approximately 361,000 …
RESEARCH · CL_107788 · Jun 23 · 15:34

New benchmark evaluates Chinese news TTS pronunciation accuracy

Researchers have introduced the CN-NewsTTS Bench, a new benchmark designed to evaluate the pronunciation accuracy of Chinese news Text-to-Speech (TTS) systems. This benchmark specifically targets complex written forms l…
RESEARCH · CL_109476 · Jun 23 · 00:00

Wan-Streamer v0.1: Unified model for real-time audio-visual interaction

Researchers have introduced Wan-Streamer v0.1, a novel end-to-end multimodal foundation model designed for real-time, low-latency audio-visual interaction. Unlike traditional cascaded systems, Wan-Streamer integrates la…
TOOL · CL_104757 · Jun 20 · 12:17

LLMs benchmarked for Japanese Grapheme-to-Phoneme conversion

A new study benchmarks over 30 large language models (LLMs) for Japanese grapheme-to-phoneme (G2P) conversion, a crucial step for text-to-speech systems. Researchers compared LLM performance against traditional morpholo…
TOOL · CL_100969 · Jun 19 · 17:01

Gemini API introduces streaming TTS for faster AI voice apps

Google's Gemini API now offers streaming Text-to-Speech (TTS) capabilities, enabling developers to create AI voice applications that feel more responsive. This feature is crucial for reducing perceived latency, as users…
RESEARCH · CL_98110 · Jun 18 · 04:00

New Hebrew G2P systems improve text-to-speech accuracy

Researchers have developed new methods for Hebrew grapheme-to-phoneme (G2P) conversion, crucial for improving text-to-speech (TTS) applications. The ReNikud system utilizes audio supervision from unlabeled Hebrew audio …
RESEARCH · CL_93326 · Jun 16 · 04:00

New research explores advanced speech quality assessment methods beyond MOS

Researchers are exploring new methods for assessing speech quality beyond traditional Mean Opinion Scores (MOS). One paper introduces PrefSQA, which uses pairwise preference prediction to reduce rater variability and im…
RESEARCH · CL_93405 · Jun 15 · 17:06

Neural audio codecs achieve smooth degradation down to 1.6 Hz

Researchers have investigated the degradation mechanisms in neural audio codecs operating at low frame rates, which are beneficial for autoregressive speech synthesis. Their study identified that a previously observed q…
COMMENTARY · CL_91316 · Jun 15 · 06:58

AI Speech Technologies: A Resource Compilation

This Mastodon post compiles resources on AI speech technologies, covering Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, and voice cloning. The collection aims to provide notes and links for those interest…
RESEARCH · CL_91367 · Jun 15 · 04:00

New TTS research explores discrete flow matching for efficiency

Two new research papers explore advancements in zero-shot text-to-speech (TTS) technology, focusing on discrete flow matching techniques. The first paper introduces DiFlow-TTS, a framework that uses a discrete flow matc…
RESEARCH · CL_77730 · Jun 8 · 08:37

New TTS Benchmark Uses Blind Voting for Objective Model Ratings

A new benchmark for Text-to-Speech (TTS) models has been launched, incorporating objective standards and blind voting to create an ELO rating system. This revamped benchmark aims to simplify the process of choosing the …
RESEARCH · CL_74415 · Jun 5 · 07:59

New models unify speech and singing voice generation

Researchers have developed new unified models for generating human vocal audio, capable of producing both speech and singing. UniVoice uses a conditional flow matching approach, separating content, melody, and timbre to…
RESEARCH · CL_72546 · Jun 4 · 08:58

New TTS framework GLASS enables independent acoustic style control

Researchers have developed GLASS, a novel framework for controlling acoustic style in zero-shot text-to-speech (TTS) systems. Unlike previous methods that entangle speaker identity with prosody, GLASS treats attributes …
TOOL · CL_65820 · Jun 2 · 04:00

Sparse autoencoders enable interpretable emotion control in TTS

Researchers have developed a new method for controlling emotions in text-to-speech (TTS) systems by utilizing sparse autoencoders (SAEs) to identify and manipulate latent features within large language models. This appr…
TOOL · CL_11987 · Apr 30 · 00:00

xAI launches Custom Voices for voice cloning and management

xAI has launched Custom Voices, a new feature allowing users to clone their own voice from a short audio recording for use in various applications. This technology enables personalized narration for videos, podcasts, an…
RESEARCH · CL_08082 · Apr 28 · 23:45

AI advances in 3D simulation, Bengali TTS, and Google Cloud Next trends

A researcher named Jousef Murad has introduced a new AI framework called Rigid-Deformation Decomposition for simulating 3D vehicle crash dynamics. Separately, a user named Himu is urging Google developers to integrate n…
RESEARCH · CL_06689 · Apr 28 · 04:00

LLM preference optimization advances TTS accuracy and user personalization

Researchers have developed new methods for aligning large language models (LLMs) with user preferences. One approach, TKTO, focuses on text-to-speech systems, enabling data-efficient, token-level optimization to improve…
RESEARCH · CL_44365 · Nov 4 · 00:00

New benchmarks and platforms advance voice agent evaluation and development

New research introduces EVA-Bench, a comprehensive framework for evaluating voice agents, addressing challenges in simulating realistic conversations and measuring performance across various failure modes. Simultaneousl…

New Japanese TTS system tackles kanji polyphony with massive data scaling

New benchmark evaluates Chinese news TTS pronunciation accuracy

Wan-Streamer v0.1: Unified model for real-time audio-visual interaction

LLMs benchmarked for Japanese Grapheme-to-Phoneme conversion

Gemini API introduces streaming TTS for faster AI voice apps

New Hebrew G2P systems improve text-to-speech accuracy

New research explores advanced speech quality assessment methods beyond MOS

Neural audio codecs achieve smooth degradation down to 1.6 Hz

AI Speech Technologies: A Resource Compilation

New TTS research explores discrete flow matching for efficiency

New TTS Benchmark Uses Blind Voting for Objective Model Ratings

New models unify speech and singing voice generation

New TTS framework GLASS enables independent acoustic style control

Sparse autoencoders enable interpretable emotion control in TTS

xAI launches Custom Voices for voice cloning and management

AI advances in 3D simulation, Bengali TTS, and Google Cloud Next trends

LLM preference optimization advances TTS accuracy and user personalization

New benchmarks and platforms advance voice agent evaluation and development