PulseAugur
EN
LIVE 21:11:30

FedQueue protocol improves federated learning across HPC facilities

Researchers have developed FedQueue, a new protocol designed to improve federated learning across multiple high-performance computing (HPC) facilities. This method addresses challenges posed by stochastic delays from batch schedulers, which can lead to training slowdowns or stale data. FedQueue predicts queue delays, buffers late arrivals, and uses staleness-aware aggregation to stabilize workloads, showing a 20.5% improvement in real-world deployments. AI

IMPACT Improves efficiency for distributed AI training across multiple computing sites.

RANK_REASON The cluster contains a research paper detailing a new protocol for federated learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

FedQueue protocol improves federated learning across HPC facilities

COVERAGE [1]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

    Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynchronous FL accumulates stale updates when queues spike. We propose FedQueue, a q…