Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV English(EN) · 4d

Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates

Researchers have developed a new Multimodal Distributed Image Compression (MDIC) framework designed to improve image reconstruction quality at extremely low bitrates. This novel approach uniquely utilizes side information in a multimodal fashion, incorporating both textual and visual data to preserve fine-grained local details and enhance global perceptual quality. The framework employs a text-to-image diffusion-based decoder conditioned on textual side information and a feature-mask generator to better exploit visual side information, leading to state-of-the-art results on benchmark datasets. AI

IMPACT This research could enable higher quality image transmission in bandwidth-constrained environments, potentially impacting applications like remote sensing and multi-view video conferencing.
RESEARCH · arXiv cs.CV English(EN) · 4d · [2 sources]

Vision Transformers Need Better Token Interaction

Researchers have identified a phenomenon called "semantic diffusion" that degrades the performance of Vision Transformers (ViTs) in dense prediction tasks over time. This occurs when global semantic information spreads inappropriately through patch tokens. To address this, the study proposes using sparse attention mechanisms, specifically entmax-1.5, to make token interactions more selective. This modification significantly improved performance on semantic segmentation benchmarks like VOC, ADE20K, and Cityscapes while maintaining image-level accuracy. AI

IMPACT Selective token mixing in Vision Transformers could enhance performance in computer vision tasks like semantic segmentation.

Brief

Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates

Vision Transformers Need Better Token Interaction