Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 9h

Model Parallelism With Subnetwork Data Parallelism

Researchers have developed a new distributed training framework called Subnetwork Data Parallelism (SDP) to address the high memory demands and communication costs associated with pre-training large neural networks. SDP partitions models into structured subnetworks that can be trained across workers without exchanging activations, significantly reducing per-device memory usage. The framework employs backward and forward masking techniques, along with neuron or block-level construction strategies, to achieve efficiency gains and improved performance in FLOP-matched settings. AI

IMPACT Reduces memory requirements for training large models, potentially enabling more efficient development and deployment of AI.

LLaMA
ResNet-18
FineWeb
CIFAR
Vaibhav Singh
Subnetwork Data Parallelism (SDP)