DeRA-MOS: Optimizing Text-to-Music Evaluation via Decoupled Listwise Ranking and Modality Alignment
Researchers have developed DeRA-MOS, a new framework designed to improve the evaluation of text-to-music (TTM) systems. This approach decouples the assessment of music impression and text alignment, addressing limitations in current evaluation methods that rely on human scores. DeRA-MOS utilizes a listwise ranking loss for music impression and a score-anchored alignment loss for text, aiming to better reflect human judgment and enhance cross-modal coherence in TTM generation. AI
IMPACT Establishes a more robust paradigm for large-scale text-to-music evaluation, potentially accelerating development and benchmarking in the field.