PulseAugur
EN
LIVE 03:30:05

LoRA Fine-Tuning Boosts Low-Resource TTS Quality for Khmer

Researchers have developed a method to improve the quality of text-to-speech (TTS) for low-resource languages like Khmer and Korean. By fine-tuning the 2.4B-parameter VoxCPM2 model using a single Low-Rank Adaptation (LoRA) adapter, they significantly enhanced the Khmer language's Mean Opinion Score (MOS) from 3.85 to 4.23. This adaptation trained only a small fraction of the model's parameters, demonstrating its efficiency. The technique proved most effective for languages where the base model initially performed poorly, showing no benefit and even degradation for Korean, which the base model already handled well. AI

IMPACT This research demonstrates an efficient method for improving TTS quality in under-resourced languages, potentially broadening access to advanced speech synthesis technologies.

RANK_REASON The cluster contains an academic paper detailing a new research methodology for improving TTS models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

LoRA Fine-Tuning Boosts Low-Resource TTS Quality for Khmer

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Phannet Pov, Sovandara Chhoun, Hyun Woo Park, Wan-Sup Cho, Saksonita Khoeurn ·

    Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

    arXiv:2606.26618v1 Announce Type: new Abstract: Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a 2.4B-pa…

  2. arXiv cs.CL TIER_1 English(EN) · Saksonita Khoeurn ·

    Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

    Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a 2.4B-parameter, tokenizer-free TTS model that joins a M…