Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 2d

Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval

Researchers have developed Omni-Embed-Audio (OEA), a new retrieval-oriented encoder that utilizes multimodal large language models for improved audio-text retrieval. Unlike previous systems that relied on caption-style queries, OEA is designed to handle more natural search behaviors, including questions, commands, and negative queries. Experiments show OEA performs comparably to existing state-of-the-art models in text-to-audio retrieval while significantly outperforming them in text-to-text retrieval and the ability to distinguish between similar-sounding audio clips. AI

IMPACT Introduces a more robust method for audio-text retrieval, potentially improving search capabilities in multimodal AI applications.

CLAP
Contrastive Language-Audio Pretraining
Omni-Embed-Audio
HaeJun Yoo