UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction
Researchers have developed UR-BERT, a novel text encoder designed to significantly expand the capabilities of massively multilingual text-to-speech (TTS) systems. Unlike traditional methods limited by grapheme-to-phoneme resources, UR-BERT unifies diverse writing systems into a common Romanization format, enabling support for 495 languages. The system also incorporates a speech token prediction objective to improve phonetic accuracy and text-speech alignment, demonstrating superior performance over existing baselines and strong generalization to new languages. AI
IMPACT Expands the reach of TTS technology to hundreds of new languages, potentially democratizing voice synthesis.