CLAP
PulseAugur coverage of CLAP — every cluster mentioning CLAP across labs, papers, and developer communities, ranked by signal.
6 day(s) with sentiment data
-
New RLHF framework aligns audio captions with human preferences
Researchers have developed a new framework for audio captioning that utilizes Reinforcement Learning from Human Feedback (RLHF) to better align generated captions with human preferences. This approach employs a reward m…
-
New AI model FIGMA enhances fine-grained music retrieval
Researchers have developed FIGMA, a new architecture designed to improve fine-grained music retrieval using natural language descriptions. Unlike previous models that struggle with detailed musical attributes like tempo…
-
Score-aware training boosts text-to-music generation with limited data
Researchers have developed a novel score-aware training method to improve text-to-music generation, particularly when working with limited data. This technique leverages audio-caption alignment scores as a direct superv…
-
New scoring method boosts noise robustness in audio-language AI
Researchers have developed a new technique called Drift-Augmented Scoring (DAS) to improve the robustness of zero-shot audio-language classification models against acoustic noise. This method adds a small bonus to the c…
-
New Omni-Embed-Audio model enhances audio-text retrieval with LLMs
Researchers have developed Omni-Embed-Audio (OEA), a new retrieval-oriented encoder that utilizes multimodal large language models for improved audio-text retrieval. Unlike previous systems that relied on caption-style …
-
New 'Mental Damage' attack poisons AI music generation
Researchers have identified a new vulnerability in retrieval-augmented text-to-music generation systems, termed "Mental Damage." This attack involves poisoning the music caption database with crafted entries that subtly…
-
New COMET Framework Analyzes Modality Gap in Audio-Text AI
Researchers have introduced COMET, a new framework to analyze the modality gap in audio-text contrastive learning models like CLAP. COMET utilizes a PLS-SVD approach to reveal that only a small subset of axes, represent…
-
StreamSplit enables efficient continuous audio learning on edge devices
Researchers have developed StreamSplit, a new framework designed to make contrastive learning practical for edge devices with fluctuating resource constraints. The system uses a distribution-based approach to decouple r…
-
CodecSep enables prompt-driven sound separation in neural audio codec latents
Researchers have developed CodecSep, a new framework for prompt-driven sound separation that operates directly within neural audio codec latent spaces. This approach allows for open-vocabulary separation of audio source…
-
Rust-based Demucs offers local, GPU-accelerated music stem separation
A new Rust implementation of the HTDemucs v4 music separation model, named Demucs CLI, has been released. This tool can split songs into individual stems like vocals, drums, and bass, running entirely locally on a user'…