PiDA: Phonetically-Informed Data Augmentation for Robust Vietnamese Speech Translation
Researchers have developed a new data augmentation technique called Phonetically-Informed Data Augmentation (PiDA) to improve Vietnamese speech translation. The method addresses error propagation in cascaded speech translation systems by generating ASR-like corruptions based on phonetic confusions. Fine-tuning with PiDA on the FLEURS Vietnamese-English dataset enhanced translation accuracy for erroneous ASR outputs, showing a notable improvement in BLEU scores. AI
IMPACT Improves robustness of speech translation systems to ASR errors, potentially enhancing usability in noisy environments.