Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1mo · [2 sources]

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

Researchers have developed a novel method for accelerating neural network inference by splitting Convolutional Neural Network (CNN) computations between Deep Learning Processing Units (DPUs) and Graphics Processing Units (GPUs). This 'Split CNN Inference' approach processes initial layers on a DPU near the data source and subsequent layers on a GPU, significantly reducing latency. A Graph Neural Network (GNN) model was also introduced to accurately predict optimal layer partitioning for various CNN architectures, achieving 96.27% accuracy. AI

IMPACT Potential for reduced latency in edge AI applications by optimizing hardware utilization for CNN inference.

GNN
CNN
GPU
ResNet50
MobileNetv2
ResNet18
LeNet-5
ResNet101
ResNet152
Versal VCK190
NVIDIA RTX 2080
VGG16