NVMOS: Non-Verbal Vocalization Quality Assessment in Speech
Researchers have developed NVMOS, a novel model designed to assess the perceptual quality of non-verbal vocalizations (NVs) in speech, such as laughter and sighs. Existing methods and general-purpose multimodal models like Gemini have shown inconsistencies in evaluating these NV events. The NVMOS model, trained on a dataset of NV-TTS system outputs and natural NVs rated by acoustic experts, aims to achieve expert-level agreement in predicting NV quality. AI
IMPACT Introduces a specialized model for evaluating non-verbal vocalizations, potentially improving TTS systems and analysis of human-computer interaction.