PulseAugur
实时 13:12:17

新的OHIRL框架从无奖励感知流中学习 · 跟踪2个来源

研究人员开发了一种新颖的在线奖励惩罚学习框架OHIRL,专为环境不提供明确奖励或标签的场景设计。OHIRL通过分析转换后果来推断诸如疼痛或错误等感知维度的效价。该框架将下一包预测、残余动力学建模、轨迹评估和策略更新的角色分开。在2x2-XOR、CartPole和Taxi等任务上的实验表明,OHIRL在最优动作选择和奖励符号预测方面能够达到高精度,优于各种对照方法。 AI

影响 引入了一种在缺乏明确奖励信号的环境中进行强化学习的新方法,有可能将AI的应用扩展到更复杂、未经整理的数据流。

排序理由 该集群包含一篇详细介绍新机器学习框架及其实验结果的arXiv论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Zirong Li ·

    Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

    arXiv:2606.18963v1 Announce Type: new Abstract: We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, dam…

  2. arXiv cs.LG TIER_1 English(EN) · Zirong Li ·

    Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

    We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, damage, or cognitive error are treated as perceptua…