Model Parallelism With Subnetwork Data Parallelism
Researchers have developed a new distributed training framework called Subnetwork Data Parallelism (SDP) to address the high memory demands and communication costs associated with pre-training large neural networks. SDP partitions models into structured subnetworks that can be trained across workers without exchanging activations, significantly reducing per-device memory usage. The framework employs backward and forward masking techniques, along with neuron or block-level construction strategies, to achieve efficiency gains and improved performance in FLOP-matched settings. AI
IMPACT Reduces memory requirements for training large models, potentially enabling more efficient development and deployment of AI.