KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026
Researchers from KIT have developed a novel approach for cross-lingual voice cloning, a technique crucial for speech translation. Their method builds upon the FishAudio-S2-Pro multilingual text-to-speech model, incorporating language tag prompting to enhance language control and minimize accent bleed-through. Additionally, they employed reinforcement learning for fine-tuning and introduced a reference-conditioned lexical matching technique to improve the pronunciation of specialized vocabulary. AI
IMPACT This research advances cross-lingual voice cloning, potentially improving the naturalness and intelligibility of translated speech and enabling more seamless multilingual communication systems.