UR-BERT enables 495-language multilingual TTS systems

By PulseAugur Editorial · [2 sources] · 2026-06-10 05:51

Researchers have developed UR-BERT, a novel text encoder designed to significantly expand the capabilities of massively multilingual text-to-speech (TTS) systems. Unlike traditional methods limited by grapheme-to-phoneme resources, UR-BERT unifies diverse writing systems into a common Romanization format, enabling support for 495 languages. The system also incorporates a speech token prediction objective to improve phonetic accuracy and text-speech alignment, demonstrating superior performance over existing baselines and strong generalization to new languages. AI

IMPACT Expands the reach of TTS technology to hundreds of new languages, potentially democratizing voice synthesis.

RANK_REASON The cluster contains a research paper detailing a new model architecture for a specific AI task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Sangmin Lee, Eekgyun Ahn, Woongjib Choi, Hong-Goo Kang · 2026-06-11 04:00

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

arXiv:2606.11681v1 Announce Type: new Abstract: We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availabi…
arXiv cs.CL TIER_1 English(EN) · Hong-Goo Kang · 2026-06-10 05:51

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availability of reliable G2P resources. In contrast, UR-…

COVERAGE [2]

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

RELATED ENTITIES

RELATED TOPICS