Markov decision processes: a tool for sequential decision making under uncertainty
PulseAugur coverage of Markov decision processes: a tool for sequential decision making under uncertainty — every cluster mentioning Markov decision processes: a tool for sequential decision making under uncertainty across labs, papers, and developer communities, ranked by signal.
7 天有情绪数据
-
New method offers robust counterfactual inference for Markov Decision Processes
研究人员开发了一种新的非参数方法,用于在马尔可夫决策过程(MDP)中进行鲁棒反事实推理。该方法克服了现有方法依赖单一固定因果模型的局限性。新技术计算所有兼容因果模型下反事实转移概率的紧密界限,并提供用于高效计算的闭式表达式。它还识别出在这些不确定的 MDP 概率中优化最坏情况奖励的鲁棒反事实策略。
-
新理论使强化学习智能体能够从人类偏好中学习
研究人员开发了一个仅使用人类偏好反馈进行强化学习的理论框架。该方法应用于情节核马尔可夫决策过程(MDP),允许智能体通过比较轨迹并接收二元偏好标签来学习最优策略。该研究为次线性遗憾界提供了理论保证,表明在足够的情节下,学习到的策略值会收敛到最优策略值。
-
新论文分析Wasserstein策略优化收敛性
一篇新论文探讨了Wasserstein策略优化(WPO)这一强化学习算法的理论收敛性质。作者认为,当WPO应用于熵正则化马尔可夫决策过程时,会表现出线性收敛。这一结论得到了近期均值场分析的进展以及局部对数-Sobolev不等式的建立的支持,这些进展证明了单调能量耗散。
-
New research advances optimization and reinforcement learning theory
Researchers have developed new theoretical frameworks for optimizing decision-making processes in machine learning. One paper introduces regret-based stopping criteria for Bayesian optimization, ensuring solutions are w…
-
Dynamic Latent Routing boosts low-data fine-tuning for language models
Researchers have developed Dynamic Latent Routing (DLR), a novel post-training method for language models. DLR jointly learns discrete latent codes, routing policies, and model parameters through a dynamic search proces…
-
New framework enhances probabilistic safety for autonomous agents
Researchers have developed a new formal framework for probabilistic safety shields in Markov Decision Processes (MDPs). This framework addresses the complexities of ensuring safety when a certain probability of undesira…
-
Q-MMR framework offers novel approach to off-policy evaluation
Researchers have introduced Q-MMR, a new theoretical framework for off-policy evaluation in Markov Decision Processes (MDPs). This method learns weights for data points to approximate expected returns under a target pol…
-
New method learns uncertain MDPs with tighter parameter estimates
Researchers have developed a new method for learning models of Markov decision processes (MDPs) that accounts for dependencies between transition probabilities. This approach uses parametric MDPs (pMDPs) to represent tr…
-
Actor-Critic RL algorithms achieve optimal sample complexity for MDPs
Two new arXiv papers explore advancements in actor-critic reinforcement learning algorithms. The first paper, though later withdrawn, proposed an optimal sample complexity of O(ε−2) for single-timescale actor-critic met…