New Q-learning algorithm robust to corrupted rewards

By PulseAugur Editorial · [1 sources] · 2026-05-22 04:00

Researchers have developed a new variant of Q-learning designed to handle adversarially corrupted rewards in reinforcement learning settings. This novel algorithm is analyzed under asynchronous sampling conditions and provides finite-time robustness guarantees. The algorithm's performance matches existing bounds, with an additive term related to corrupted samples, and establishes a near-optimal information-theoretic lower bound. AI

IMPACT Introduces a more robust reinforcement learning algorithm, potentially improving reliability in real-world applications where reward signals may be noisy or manipulated.

RANK_REASON Academic paper detailing a new algorithm with theoretical guarantees. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Q-learning algorithm robust to corrupted rewards

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Sreejeet Maity, Aritra Mitra · 2026-05-22 04:00

Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates

arXiv:2509.08933v2 Announce Type: replace Abstract: We study the problem of learning the optimal policy in a discounted, infinite-horizon reinforcement learning (RL) setting in the presence of adversarially corrupted rewards. To address this problem, we develop a novel robust var…

COVERAGE [1]

Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates

RELATED ENTITIES

RELATED TOPICS