Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

NVMOS: Non-Verbal Vocalization Quality Assessment in Speech

Researchers have developed NVMOS, a novel model designed to assess the perceptual quality of non-verbal vocalizations (NVs) in speech, such as laughter and sighs. Existing methods and general-purpose multimodal models like Gemini have shown inconsistencies in evaluating these NV events. The NVMOS model, trained on a dataset of NV-TTS system outputs and natural NVs rated by acoustic experts, aims to achieve expert-level agreement in predicting NV quality. AI

IMPACT Introduces a specialized model for evaluating non-verbal vocalizations, potentially improving TTS systems and analysis of human-computer interaction.

Gemini
arXiv
NVMOS
NV-TTS