Markov decision process
PulseAugur coverage of Markov decision process — every cluster mentioning Markov decision process across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
New Q-value iteration analysis uses switching geometry
This paper introduces a new framework for analyzing Q-value iteration in Markov decision processes, focusing on a technique called rank-one deflation. The authors interpret the algorithm's behavior through the geometry …
-
New protocol optimizes drug trial subsidies to boost social utility
Researchers have developed a new statistical protocol for sequential experimentation that aims to optimize social utility in high-stakes domains like drug development. This protocol involves a product developer conducti…
-
Q-MMR framework offers novel approach to off-policy evaluation
Researchers have introduced Q-MMR, a new theoretical framework for off-policy evaluation in Markov Decision Processes (MDPs). This method learns weights for data points to approximate expected returns under a target pol…
-
Reinforcement learning uses symmetry and data augmentation for faster aircraft control
Researchers have developed a new method for offline reinforcement learning that leverages the symmetry of dynamical systems to improve sample efficiency. This approach uses symmetric data augmentation to enhance the sta…
-
Reinforcement learning enhances autonomous target tracking accuracy and robustness
Researchers have developed a deep reinforcement learning approach for autonomous bearings-only tracking of moving targets. The system formulates the observer maneuver problem as a belief Markov decision process, using a…
-
RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts
Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
-
New research advances adversarial imitation learning theory and practice
Two new papers explore the theoretical underpinnings of adversarial imitation learning (AIL), a technique that uses neural networks to learn from expert demonstrations. The first paper introduces OPT-AIL, a framework de…
-
New metric-normalized posterior leakage (mPL) enhances privacy for joint AI consumption
Researchers have developed a new privacy metric called Metric-Normalized Posterior Leakage (mPL) to address limitations in existing differential privacy methods, particularly for machine learning systems used under join…
-
Researchers find random data deletion improves adaptive RL policies
Researchers have discovered that randomly deleting a portion of training data can significantly improve the performance of adaptive reinforcement learning policies. This counterintuitive technique helps by implicitly do…
-
DRL framework optimizes NR-U/Wi-Fi coexistence for fairness and throughput
Researchers have developed a policy-driven deep reinforcement learning framework to manage resource allocation between NR-U and Wi-Fi networks operating in unlicensed spectrum. This framework uses a deep Q-network to le…
-
AutoREC platform uses RL agents to generate circuit models from EIS data
Researchers have developed AutoREC, an open-source Python package designed to automate the generation of equivalent circuit models (ECMs) from electrochemical impedance spectroscopy (EIS) data. This platform utilizes re…
-
Yann LeCun clarifies technical definition of 'world models' in AI
Yann LeCun shared a technical discussion regarding the term "world models" in AI. He clarified that in control theory and the context of Markov Decision Processes (MDPs), "world models" specifically refers to transition…
-
New research explores Bellman residual minimization for control tasks in reinforcement learning
This paper introduces foundational results for Bellman residual minimization applied to policy optimization in Markov decision problems. While dynamic programming is more common, Bellman residual minimization offers adv…
-
AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation
Researchers have developed AsyncShield, a new framework designed to improve the navigation capabilities of Vision-Language-Action (VLA) models on mobile robots. This system addresses the latency and network jitter issue…
-
New algorithm identifies near-optimal policies in robust constrained Markov decision processes
Researchers have developed a novel algorithm to identify near-optimal policies in robust constrained Markov decision processes (RCMDPs). This new method addresses limitations in existing policy gradient approaches that …
-
Researchers develop MDP and POMDP for error mitigation in digital twins
Researchers have developed a new framework for mitigating error propagation in modular digital twins by treating it as a sequential decision-making problem. They formulated this using a Markov Decision Process (MDP) and…