FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost
A new paper introduces FreeScale, a method designed to improve the efficiency of distributed training for sequence recommendation models. FreeScale addresses computational bottlenecks caused by stragglers and slow communications by employing load-balanced input samples and overlapping communication with computation. The technique also utilizes SM-Free methods to manage GPU resource competition, reportedly reducing computational bubbles by over 90% on 256 H100 GPUs. AI
IMPACT Optimizes distributed training for recommendation models, potentially reducing compute costs and training times.