Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 19h

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

Researchers have developed DCC, a novel data-centric compiler designed to optimize machine learning kernels for Processing-In-Memory (PIM) architectures. This compiler addresses the challenges of data rearrangement and compute code optimization by jointly optimizing these interdependent processes. DCC supports multiple PIM backends through a multi-layer abstraction and has demonstrated significant speedups, achieving up to 7.68x on HBM-PIM and 13.17x on AttAcc PIM compared to GPU-only execution. For end-to-end LLM inference, DCC on AttAcc accelerated GPT-3 and LLaMA-2 by an average of 4.52x. AI

IMPACT Enables significant acceleration for LLM inference and other ML workloads on specialized Processing-In-Memory hardware.

GPT-3
LLaMA-2
Machine Learning
DCC
Processing-In-Memory
HBM-PIM
AttAcc PIM