New WBMM technique boosts large kernel convolution efficiency

By PulseAugur Editorial · [2 sources] · 2026-07-02 12:33

Researchers have developed a new technique called Windowed Batch Matrix Multiplication (WBMM) to improve the efficiency of large kernel depthwise convolutions. Traditional methods suffer performance degradation as kernel size increases, but WBMM partitions input into windows and uses a bias table to construct weight matrices, allowing for regular memory access through batched matrix multiplication. This approach shows improved throughput with larger windows and achieves comparable or better accuracy on benchmarks like ImageNet-1K, COCO, and ADE20K, with significant training speedups across various hardware platforms. AI

IMPACT WBMM offers a path to more efficient training and inference for models requiring large receptive fields, potentially improving performance on various hardware.

RANK_REASON Academic paper detailing a new computational technique for deep learning convolutions. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New WBMM technique boosts large kernel convolution efficiency

COVERAGE [2]

arXiv cs.LG TIER_1 English(EN) · Wan Song, Wei Zhou, Rui Wang, Jun Yu, Toru Kurihara, Jiajia Xu, Shu Zhan · 2026-07-03 04:00

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

arXiv:2607.02097v1 Announce Type: cross Abstract: Large kernel depthwise convolutions achieve strong performance but suffer from significant degradation as kernel size grows due to irregular memory access from gather-based computation; while Large Kernel Acceleration (LKA) helps …
arXiv cs.LG TIER_1 English(EN) · Shu Zhan · 2026-07-02 12:33

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

Large kernel depthwise convolutions achieve strong performance but suffer from significant degradation as kernel size grows due to irregular memory access from gather-based computation; while Large Kernel Acceleration (LKA) helps on small feature maps, it becomes counterproductiv…

COVERAGE [2]

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

RELATED ENTITIES

RELATED TOPICS