Brief

last 24h

[4/4] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 4d

Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

Researchers have introduced new methods, NTCE and NONL, to improve supervised classification by achieving Neural Collapse (NC) more efficiently. These techniques address limitations in existing paradigms like cross-entropy and supervised contrastive learning. By treating supervised learning as prototype learning on a hypersphere, the new losses enable faster convergence to NC and yield significant improvements in transfer learning and robustness, especially under class imbalance. AI

IMPACT Introduces novel losses that accelerate convergence to optimal classification geometry and improve model robustness.
TOOL · arXiv cs.CV English(EN) · 4d

RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

Researchers have developed RobuQ, a new framework designed to significantly reduce the computational and memory costs associated with Diffusion Transformers (DiTs) for image generation. This method focuses on robust activation quantization, enabling DiTs to operate at extremely low bit settings, specifically achieving stable image generation on ImageNet-1K with activations quantized to an average of 2 bits. The framework introduces novel techniques like RobustQuantizer and an Activation-only Mixed-Precision Network pipeline to overcome the challenges of quantizing DiT activations. AI

IMPACT Enables more efficient deployment of Diffusion Transformers for image generation, potentially lowering hardware requirements.
RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

Slimmable ConvNeXt: Width-Adaptive Inference for Efficient Multi-Device Deployment

Researchers have developed Slimmable ConvNeXt, a novel approach to creating adaptable vision models. This method trains a single set of weights that can dynamically adjust its capacity for efficient deployment across various devices and fluctuating computational resources. The Slimmable ConvNeXt-T model achieves 80.8% accuracy on ImageNet-1k with 4.5 GMACs, outperforming existing scalable methods like HydraViT and MatFormer-S. AI

IMPACT Enables more efficient deployment of vision models across diverse hardware, reducing the need for multiple model versions.
RESEARCH · arXiv cs.CV English(EN) · 3d · [2 sources]

Vision Transformers Need Better Token Interaction

Researchers have identified a phenomenon called "semantic diffusion" that degrades the performance of Vision Transformers (ViTs) in dense prediction tasks over time. This occurs when global semantic information spreads inappropriately through patch tokens. To address this, the study proposes using sparse attention mechanisms, specifically entmax-1.5, to make token interactions more selective. This modification significantly improved performance on semantic segmentation benchmarks like VOC, ADE20K, and Cityscapes while maintaining image-level accuracy. AI

IMPACT Selective token mixing in Vision Transformers could enhance performance in computer vision tasks like semantic segmentation.

Brief

Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

Slimmable ConvNeXt: Width-Adaptive Inference for Efficient Multi-Device Deployment

Vision Transformers Need Better Token Interaction