PulseAugur
EN
LIVE 08:56:48
ENTITY speech synthesis

speech synthesis

PulseAugur coverage of speech synthesis — every cluster mentioning speech synthesis across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
18
18 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
15
15 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

11 day(s) with sentiment data

RECENT · PAGE 1/1 · 18 TOTAL
  1. RESEARCH · CL_109575 ·

    New Japanese TTS system tackles kanji polyphony with massive data scaling

    Researchers have developed Sarashina2.2-TTS, a novel text-to-speech system specifically designed for Japanese, addressing the challenge of kanji polyphony. The system utilizes a massive dataset of approximately 361,000 …

  2. RESEARCH · CL_107788 ·

    New benchmark evaluates Chinese news TTS pronunciation accuracy

    Researchers have introduced the CN-NewsTTS Bench, a new benchmark designed to evaluate the pronunciation accuracy of Chinese news Text-to-Speech (TTS) systems. This benchmark specifically targets complex written forms l…

  3. RESEARCH · CL_109476 ·

    Wan-Streamer v0.1: Unified model for real-time audio-visual interaction

    Researchers have introduced Wan-Streamer v0.1, a novel end-to-end multimodal foundation model designed for real-time, low-latency audio-visual interaction. Unlike traditional cascaded systems, Wan-Streamer integrates la…

  4. TOOL · CL_104757 ·

    LLMs benchmarked for Japanese Grapheme-to-Phoneme conversion

    A new study benchmarks over 30 large language models (LLMs) for Japanese grapheme-to-phoneme (G2P) conversion, a crucial step for text-to-speech systems. Researchers compared LLM performance against traditional morpholo…

  5. TOOL · CL_100969 ·

    Gemini API introduces streaming TTS for faster AI voice apps

    Google's Gemini API now offers streaming Text-to-Speech (TTS) capabilities, enabling developers to create AI voice applications that feel more responsive. This feature is crucial for reducing perceived latency, as users…

  6. RESEARCH · CL_98110 ·

    New Hebrew G2P systems improve text-to-speech accuracy

    Researchers have developed new methods for Hebrew grapheme-to-phoneme (G2P) conversion, crucial for improving text-to-speech (TTS) applications. The ReNikud system utilizes audio supervision from unlabeled Hebrew audio …

  7. RESEARCH · CL_93326 ·

    New research explores advanced speech quality assessment methods beyond MOS

    Researchers are exploring new methods for assessing speech quality beyond traditional Mean Opinion Scores (MOS). One paper introduces PrefSQA, which uses pairwise preference prediction to reduce rater variability and im…

  8. RESEARCH · CL_93405 ·

    Neural audio codecs achieve smooth degradation down to 1.6 Hz

    Researchers have investigated the degradation mechanisms in neural audio codecs operating at low frame rates, which are beneficial for autoregressive speech synthesis. Their study identified that a previously observed q…

  9. COMMENTARY · CL_91316 ·

    AI Speech Technologies: A Resource Compilation

    This Mastodon post compiles resources on AI speech technologies, covering Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, and voice cloning. The collection aims to provide notes and links for those interest…

  10. RESEARCH · CL_91367 ·

    New TTS research explores discrete flow matching for efficiency

    Two new research papers explore advancements in zero-shot text-to-speech (TTS) technology, focusing on discrete flow matching techniques. The first paper introduces DiFlow-TTS, a framework that uses a discrete flow matc…

  11. RESEARCH · CL_77730 ·

    New TTS Benchmark Uses Blind Voting for Objective Model Ratings

    A new benchmark for Text-to-Speech (TTS) models has been launched, incorporating objective standards and blind voting to create an ELO rating system. This revamped benchmark aims to simplify the process of choosing the …

  12. RESEARCH · CL_74415 ·

    New models unify speech and singing voice generation

    Researchers have developed new unified models for generating human vocal audio, capable of producing both speech and singing. UniVoice uses a conditional flow matching approach, separating content, melody, and timbre to…

  13. RESEARCH · CL_72546 ·

    New TTS framework GLASS enables independent acoustic style control

    Researchers have developed GLASS, a novel framework for controlling acoustic style in zero-shot text-to-speech (TTS) systems. Unlike previous methods that entangle speaker identity with prosody, GLASS treats attributes …

  14. TOOL · CL_65820 ·

    Sparse autoencoders enable interpretable emotion control in TTS

    Researchers have developed a new method for controlling emotions in text-to-speech (TTS) systems by utilizing sparse autoencoders (SAEs) to identify and manipulate latent features within large language models. This appr…

  15. TOOL · CL_11987 ·

    xAI launches Custom Voices for voice cloning and management

    xAI has launched Custom Voices, a new feature allowing users to clone their own voice from a short audio recording for use in various applications. This technology enables personalized narration for videos, podcasts, an…

  16. RESEARCH · CL_08082 ·

    AI advances in 3D simulation, Bengali TTS, and Google Cloud Next trends

    A researcher named Jousef Murad has introduced a new AI framework called Rigid-Deformation Decomposition for simulating 3D vehicle crash dynamics. Separately, a user named Himu is urging Google developers to integrate n…

  17. RESEARCH · CL_06689 ·

    LLM preference optimization advances TTS accuracy and user personalization

    Researchers have developed new methods for aligning large language models (LLMs) with user preferences. One approach, TKTO, focuses on text-to-speech systems, enabling data-efficient, token-level optimization to improve…

  18. RESEARCH · CL_44365 ·

    New benchmarks and platforms advance voice agent evaluation and development

    New research introduces EVA-Bench, a comprehensive framework for evaluating voice agents, addressing challenges in simulating realistic conversations and measuring performance across various failure modes. Simultaneousl…