PulseAugur
实时 06:31:27
实体 Markov decision processes: a tool for sequential decision making under uncertainty

Markov decision processes: a tool for sequential decision making under uncertainty

PulseAugur coverage of Markov decision processes: a tool for sequential decision making under uncertainty — every cluster mentioning Markov decision processes: a tool for sequential decision making under uncertainty across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
9
90 天内 9
发布 · 30天
0
90 天内 0
论文 · 30天
9
90 天内 9
层级分布 · 90 天
情绪 · 30 天

7 天有情绪数据

最近 · 第 1/1 页 · 共 9 条
  1. TOOL · CL_48775 ·

    New method offers robust counterfactual inference for Markov Decision Processes

    研究人员开发了一种新的非参数方法,用于在马尔可夫决策过程(MDP)中进行鲁棒反事实推理。该方法克服了现有方法依赖单一固定因果模型的局限性。新技术计算所有兼容因果模型下反事实转移概率的紧密界限,并提供用于高效计算的闭式表达式。它还识别出在这些不确定的 MDP 概率中优化最坏情况奖励的鲁棒反事实策略。

  2. RESEARCH · CL_48581 ·

    新理论使强化学习智能体能够从人类偏好中学习

    研究人员开发了一个仅使用人类偏好反馈进行强化学习的理论框架。该方法应用于情节核马尔可夫决策过程(MDP),允许智能体通过比较轨迹并接收二元偏好标签来学习最优策略。该研究为次线性遗憾界提供了理论保证,表明在足够的情节下,学习到的策略值会收敛到最优策略值。

  3. RESEARCH · CL_44036 ·

    新论文分析Wasserstein策略优化收敛性

    一篇新论文探讨了Wasserstein策略优化(WPO)这一强化学习算法的理论收敛性质。作者认为,当WPO应用于熵正则化马尔可夫决策过程时,会表现出线性收敛。这一结论得到了近期均值场分析的进展以及局部对数-Sobolev不等式的建立的支持,这些进展证明了单调能量耗散。

  4. RESEARCH · CL_39995 ·

    New research advances optimization and reinforcement learning theory

    Researchers have developed new theoretical frameworks for optimizing decision-making processes in machine learning. One paper introduces regret-based stopping criteria for Bayesian optimization, ensuring solutions are w…

  5. RESEARCH · CL_32716 ·

    Dynamic Latent Routing boosts low-data fine-tuning for language models

    Researchers have developed Dynamic Latent Routing (DLR), a novel post-training method for language models. DLR jointly learns discrete latent codes, routing policies, and model parameters through a dynamic search proces…

  6. TOOL · CL_28268 ·

    New framework enhances probabilistic safety for autonomous agents

    Researchers have developed a new formal framework for probabilistic safety shields in Markov Decision Processes (MDPs). This framework addresses the complexities of ensuring safety when a certain probability of undesira…

  7. RESEARCH · CL_21752 ·

    Q-MMR framework offers novel approach to off-policy evaluation

    Researchers have introduced Q-MMR, a new theoretical framework for off-policy evaluation in Markov Decision Processes (MDPs). This method learns weights for data points to approximate expected returns under a target pol…

  8. TOOL · CL_16040 ·

    New method learns uncertain MDPs with tighter parameter estimates

    Researchers have developed a new method for learning models of Markov decision processes (MDPs) that accounts for dependencies between transition probabilities. This approach uses parametric MDPs (pMDPs) to represent tr…

  9. RESEARCH · CL_16033 ·

    Actor-Critic RL algorithms achieve optimal sample complexity for MDPs

    Two new arXiv papers explore advancements in actor-critic reinforcement learning algorithms. The first paper, though later withdrawn, proposed an optimal sample complexity of O(ε−2) for single-timescale actor-critic met…