新的OHIRL框架从无奖励感知流中学习 · 跟踪2个来源

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-17 11:43

研究人员开发了一种新颖的在线奖励惩罚学习框架OHIRL，专为环境不提供明确奖励或标签的场景设计。OHIRL通过分析转换后果来推断诸如疼痛或错误等感知维度的效价。该框架将下一包预测、残余动力学建模、轨迹评估和策略更新的角色分开。在2x2-XOR、CartPole和Taxi等任务上的实验表明，OHIRL在最优动作选择和奖励符号预测方面能够达到高精度，优于各种对照方法。 AI

影响引入了一种在缺乏明确奖励信号的环境中进行强化学习的新方法，有可能将AI的应用扩展到更复杂、未经整理的数据流。

排序理由该集群包含一篇详细介绍新机器学习框架及其实验结果的arXiv论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Zirong Li · 2026-06-18 04:00

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

arXiv:2606.18963v1 Announce Type: new Abstract: We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, dam…
arXiv cs.LG TIER_1 English(EN) · Zirong Li · 2026-06-17 11:43

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, damage, or cognitive error are treated as perceptua…

报道来源 [2]

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

相关实体

相关话题