New OHIRL framework learns from reward-free perceptual streams · 2 sources tracked

By PulseAugur Editorial · [2 sources] · 2026-06-17 11:43

Researchers have developed a novel online reward-punishment learning framework, OHIRL, designed for scenarios where environments provide no explicit rewards or labels. OHIRL infers the valence of perceptual dimensions like pain or error by analyzing transition consequences. The framework separates roles for next-packet prediction, residual dynamics modeling, trajectory evaluation, and policy updates. Experiments on tasks such as 2x2-XOR, CartPole, and Taxi demonstrated OHIRL's ability to achieve high accuracy in optimal-action selection and reward-sign prediction, outperforming various control methods. AI

IMPACT Introduces a novel approach to reinforcement learning in environments lacking explicit reward signals, potentially expanding AI's applicability to more complex, uncurated data streams.

RANK_REASON The cluster contains an arXiv paper detailing a new machine learning framework and its experimental results.

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.LG TIER_1 English(EN) · Zirong Li · 2026-06-18 04:00

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

arXiv:2606.18963v1 Announce Type: new Abstract: We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, dam…
arXiv cs.LG TIER_1 English(EN) · Zirong Li · 2026-06-17 11:43

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, damage, or cognitive error are treated as perceptua…

COVERAGE [2]

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

RELATED ENTITIES

RELATED TOPICS