Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
Researchers have developed a new approach to distributed adversarial bandits, improving upon previous regret bounds. The method utilizes a black-box reduction to bandits with delayed feedback, requiring only gossip-based communication among agents. This new algorithm achieves a significantly better upper bound than prior work and is complemented by a matching lower bound, demonstrating the problem's decomposition into communication and bandit costs. The framework is also versatile, yielding bounds for distributed linear bandits with reduced communication overhead. AI