Offline Reinforcement Learning
PulseAugur coverage of Offline Reinforcement Learning — every cluster mentioning Offline Reinforcement Learning across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
New dataset Insulin4RL enables offline reinforcement learning with irregular clinical data
Researchers have introduced Insulin4RL, a new dataset designed for offline reinforcement learning in healthcare settings. This dataset, derived from MIMIC-IV, contains over 375,000 decisions from 12,209 intensive care u…
-
New framework refines offline RL trajectories using counterfactual flows
Researchers have introduced a new framework called counterfactual transport flows for offline reinforcement learning. This method aims to improve decision-making policies using only logged historical data, without extra…
-
New benchmark standardizes offline RL for nuclear fusion plasma control
Researchers have introduced RL4F, a new benchmark designed to standardize the evaluation of offline reinforcement learning for plasma control in nuclear fusion. This benchmark utilizes historical data from the DIII-D to…
-
New TrojanTO attack targets trajectory optimization models in RL
Researchers have developed TrojanTO, a novel method for executing action-level backdoor attacks against trajectory optimization (TO) models used in offline reinforcement learning. Unlike previous reward-manipulation att…
-
New bootstrap method enhances offline reinforcement learning analysis
Researchers have developed a new model-based bootstrap method for controlled Markov chains, particularly useful in offline reinforcement learning scenarios where the data-generating policy is unknown. This technique est…
-
New ME-AM framework enhances offline RL with entropy maximization
Researchers have introduced Maximum Entropy Adjoint Matching (ME-AM), a new framework designed to improve offline reinforcement learning. This method addresses limitations in existing approaches, such as popularity bias…
-
New Q-Ising method optimizes dynamic treatment allocation on networks
Researchers have developed Q-Ising, a novel three-stage pipeline for dynamic treatment allocation in networks. This method integrates network structure with dynamic treatment strategies, addressing limitations of existi…
-
New AdamO optimizer enhances stability and performance in offline RL
Researchers have introduced AdamO, a novel optimizer designed to enhance stability in offline reinforcement learning. This new optimizer addresses the issue of 'collapse,' where errors in temporal-difference updates can…