GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving
Researchers have developed GF-DiT, a novel runtime system designed to optimize the serving of Diffusion Transformers (DiTs), which are increasingly used for image and video generation. Unlike existing systems that use static parallelism, GF-DiT dynamically adjusts GPU parallelism based on workload demands and service objectives. This is achieved through an asynchronous execution abstraction and a communication abstraction called group-free collectives, enabling efficient online GPU reallocation and reducing communication overhead. AI
IMPACT This research could significantly improve the efficiency and reduce the latency of AI-powered image and video generation services.