Brief

last 24h

[2/2] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Daily Papers English(EN) · 23h · [3 sources]

Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

Researchers have developed a new method for class-incremental learning (CIL) in audio-visual settings, addressing the challenge of acquiring new knowledge without losing previously learned information. The approach integrates the SAM-Audio multimodal model by using its audio features to guide visual representations through a novel attention strategy. To further combat catastrophic forgetting, the method incorporates dual-level distillation objectives at both feature and logit levels, demonstrating superior performance on audio-visual CIL benchmarks compared to existing state-of-the-art techniques. AI

IMPACT Introduces a novel approach to audio-visual class-incremental learning, potentially improving continuous learning capabilities in multimodal AI systems.
- SAM-Audio
- Class-Incremental Learning (CIL)
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 1w · [15 sources]

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking

Researchers have introduced several new frameworks and benchmarks for multimodal retrieval tasks. Dynamic Adapter Routing (DAR) addresses continual multimodal retrieval by using prototype-based routing for adapter selection. V-SPLADE offers an inference-free sparse retriever for visual documents, improving lexical grounding with caption-gated token supervision. HiKEY proposes a hierarchical retrieval framework for document question answering, leveraging document structure for better routing and evidence integration. Additionally, DeepImageSearch frames image retrieval as an autonomous exploration task within visual histories, introducing a new benchmark (DISBench) to evaluate agentic reasoning. AI

IMPACT These advancements offer improved methods for searching and understanding complex multimodal data, potentially accelerating research and application development in areas like document analysis and visual question answering.

Brief

Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking