Two new research papers explore the application of differential privacy in bandit problems. The first paper introduces an algorithm for extensive-form bandit problems that achieves local differential privacy with a regret bound of \(\tilde{O}(\sqrt{A\ln(S)T}/\epsilon)\). The second paper proposes a fully distributed algorithm for max-min fair multi-agent bandits that preserves reward privacy while achieving a polynomial dependence on the number of agents and near-logarithmic dependence on the horizon. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT These papers advance the theoretical understanding of privacy in multi-agent reinforcement learning settings.
RANK_REASON Two arXiv papers present novel algorithms for privacy-preserving bandit problems.