Imagenet 1k
PulseAugur coverage of Imagenet 1k — every cluster mentioning Imagenet 1k across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Vision Transformers improved with selective token interaction
研究人员发现了一种称为“语义扩散”的现象,该现象会随着时间的推移降低 Vision Transformers (ViTs) 在密集预测任务中的性能。当全局语义信息不恰当地通过 patch tokens 扩散时会发生这种情况。为了解决这个问题,该研究提出使用稀疏注意力机制,特别是 entmax-1.5,使 token 交互更具选择性。这一改进显著提高了在 VOC、ADE20K 和 Cityscapes 等语义分割基准上的性能,同时保持了…
-
RobuQ 框架使扩散 Transformer 能够在超低比特精度下运行
研究人员开发了 RobuQ,一个旨在显著降低图像生成扩散 Transformer (DiTs) 计算和内存成本的新框架。该方法侧重于鲁棒激活量化,使 DiTs 能够在极低的比特设置下运行,特别是在 ImageNet-1K 上实现稳定的图像生成,激活量化平均为 2 比特。该框架引入了如 RobustQuantizer 和仅激活混合精度网络管道等新技术,以克服量化 DiT 激活的挑战。
-
新的损失函数可加速监督学习中的神经坍塌
研究人员引入了新的方法NTCE和NONL,通过更有效地实现神经坍塌(NC)来改进监督分类。这些技术解决了现有范式(如交叉熵和监督对比学习)的局限性。通过将监督学习视为超球体上的原型学习,新的损失函数能够更快地收敛到NC,并在迁移学习和鲁棒性方面取得显著改进,尤其是在类别不平衡的情况下。
-
Slimmable ConvNeXt 实现自适应视觉模型部署
研究人员开发了 Slimmable ConvNeXt,这是一种创建自适应视觉模型的新方法。该方法训练一组单一的权重,可以动态调整其容量,以便在各种设备和不断变化的计算资源上高效部署。Slimmable ConvNeXt-T 模型在 ImageNet-1k 上实现了 80.8% 的准确率,计算量为 4.5 GMACs,优于 HydraViT 和 MatFormer-S 等现有的可扩展方法。
-
Shapley Neuron Values framework combats AI model forgetting
Researchers have introduced Shapley Neuron Values (SNV), a new framework for continual learning that uses cooperative game theory to identify and preserve the most important neurons in a neural network. This method aims…
-
PODS framework boosts AI model training efficiency by 2x
Researchers have developed a new framework called PODS (Plug-and-play Oscillatory Data-volume Scheduling) to make model training more efficient. PODS dynamically adjusts the amount of data used during training, alternat…
-
TINS method enhances OOD detection in vision-language models
Researchers have developed TINS, a novel method for Out-of-Distribution (OOD) detection in vision-language models. TINS addresses limitations of static negative labels by learning dynamic negative semantics during test-…
-
bViT uses single-block recurrence for parameter-efficient vision transformers
Researchers have developed bViT, a novel Vision Transformer architecture that utilizes a single transformer block applied repeatedly for image recognition. This recurrent approach achieves accuracy comparable to standar…
-
FractalMamba++ scales vision models across resolutions using Hilbert curves
Researchers have introduced FractalMamba++, an enhanced vision backbone designed to improve the performance of Mamba-based models, particularly with high-resolution inputs. This new architecture leverages the geometric …
-
Researchers optimize Vision Transformers for semiconductor inspection
Researchers have developed a novel framework to optimize Vision Transformers (ViTs) for deployment in resource-constrained industrial settings. This approach simultaneously optimizes architecture, token compression, and…
-
New HyCAS defense bridges gap between certified and empirical adversarial robustness
Researchers have developed a new adversarial defense technique called Hybrid Convolutions with Attention Stochasticity (HyCAS). This method aims to bridge the gap between theoretical robustness guarantees and practical …
-
Colinearity Decay 训练 Vision Transformers 以实现更好的低比特量化
研究人员开发了一种名为 Colinearity Decay (CD) 的新训练技术,以使 Vision Transformers (ViTs) 更易于进行低比特量化。该方法充当结构正则化器,惩罚 Transformer 块内的对齐以减轻有害的激活离群值,同时不影响架构或任务损失。CD 旨在提高量化模型的准确性,同时保持或增强全精度性能,为 ViTs 的高效部署提供了一种方法,且没有推理时间开销。
-
Hyperspherical Forward-Forward algorithm speeds up inference for image classification
Researchers have developed a new algorithm called Hyperspherical Forward-Forward (HFF) that significantly speeds up the inference process of the Forward-Forward (FF) algorithm. By reframing the FF algorithm's local obje…
-
视觉Transformer利用DCT提升注意力和效率
研究人员开发了一种利用离散余弦变换(DCT)来增强视觉Transformer的新颖方法。该方法包括一种基于DCT的自注意力初始化策略,可提高在CIFAR-10和ImageNet-1K等基准测试上的分类准确性。此外,一种基于DCT的注意力压缩技术通过截断输入块的高频分量来降低计算开销,从而在Swin Transformer等模型中保持性能。
-
Flow Matching research advances efficiency, control, and applications
Recent research explores advancements in Flow Matching, a generative modeling technique. Several papers introduce new methods to improve its efficiency, controllability, and applicability to diverse data types. Innovati…
-
Researchers adapt self-supervised learning for plant image recognition
Researchers have developed a self-supervised learning approach for plant image recognition, addressing the limitations of traditional supervised methods that require extensive expert-labeled data. The study found that s…
-
Vision SmolMamba uses spike-guided pruning for energy-efficient vision models
Researchers have introduced Vision SmolMamba, a novel energy-efficient spiking state-space architecture designed for visual modeling. This architecture integrates spike-driven dynamics with linear-time selective recurre…
-
New AI methods enhance out-of-distribution detection and representation learning
Researchers have developed UFCOD, a novel framework for few-shot cross-domain out-of-distribution (OOD) detection. UFCOD leverages information-geometric analysis of diffusion trajectories, extracting 'Path Energy' and '…