Brief · PulseAugur

RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

Researchers have developed a novel online reward-punishment learning framework, OHIRL, designed for scenarios where environments provide no explicit rewards or labels. OHIRL infers the valence of perceptual dimensions like pain or error by analyzing transition consequences. The framework separates roles for next-packet prediction, residual dynamics modeling, trajectory evaluation, and policy updates. Experiments on tasks such as 2x2-XOR, CartPole, and Taxi demonstrated OHIRL's ability to achieve high accuracy in optimal-action selection and reward-sign prediction, outperforming various control methods. AI

IMPACT Introduces a novel approach to reinforcement learning in environments lacking explicit reward signals, potentially expanding AI's applicability to more complex, uncurated data streams.

Hugging Face
arXiv
CartPole
Taxi
OHIRL
M_psi
D_omega
C'était ici
B Xiao
2x2-XOR