PulseAugur
实时 05:25:20

FedQueue protocol improves federated learning across HPC facilities

Researchers have developed FedQueue, a new protocol designed to improve federated learning across multiple high-performance computing (HPC) facilities. This method addresses challenges posed by stochastic delays from batch schedulers, which can lead to training slowdowns or stale data. FedQueue predicts queue delays, buffers late arrivals, and uses staleness-aware aggregation to stabilize workloads, showing a 20.5% improvement in real-world deployments. AI

影响 Improves efficiency for distributed AI training across multiple computing sites.

排序理由 The cluster contains a research paper detailing a new protocol for federated learning. [lever_c_demoted from research: ic=1 ai=1.0]

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

FedQueue protocol improves federated learning across HPC facilities

报道来源 [1]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

    Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynchronous FL accumulates stale updates when queues spike. We propose FedQueue, a q…