English(EN) WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

新的WBMM技术提高了大卷积核的效率

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员开发了一种名为窗口化批处理矩阵乘法（WBMM）的新技术，以提高大卷积核深度卷积的效率。传统方法随着卷积核尺寸的增加而出现性能下降，但WBMM将输入分割成窗口并使用偏置表构建权重矩阵，通过批处理矩阵乘法实现规则的内存访问。该方法在更大的窗口下显示出更高的吞吐量，并在ImageNet-1K、COCO和ADE20K等基准测试中取得了相当或更好的准确性，并在各种硬件平台上实现了显著的训练加速。 AI

影响 WBMM为需要大感受野的模型提供了更高效的训练和推理途径，有可能在各种硬件上提高性能。

排序理由详细介绍深度学习卷积新计算技术的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Wan Song, Wei Zhou, Rui Wang, Jun Yu, Toru Kurihara, Jiajia Xu, Shu Zhan · 2026-07-03 04:00

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

arXiv:2607.02097v1 Announce Type: cross Abstract: Large kernel depthwise convolutions achieve strong performance but suffer from significant degradation as kernel size grows due to irregular memory access from gather-based computation; while Large Kernel Acceleration (LKA) helps …

报道来源 [1]

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

相关实体

相关话题