Brief

last 24h

[4/4] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv stat.ML English(EN) · 1mo · [2 sources]

Model-based Bootstrap of Controlled Markov Chains

Researchers have developed a new model-based bootstrap method for controlled Markov chains, particularly useful in offline reinforcement learning scenarios where the data-generating policy is unknown. This technique establishes distributional consistency for transition estimators and extends to policy evaluation and recovery, providing asymptotically valid confidence intervals for value and Q-functions. Experimental results on the RiverSwim problem demonstrate that the proposed confidence intervals offer improved calibration and coverage compared to existing methods, especially with limited data. AI

IMPACT Improves confidence interval calibration for offline reinforcement learning, aiding in more reliable policy evaluation and recovery.
TOOL · arXiv cs.LG English(EN) · 1mo

Entropy-Regularized Adjoint Matching for Offline RL

Researchers have introduced Maximum Entropy Adjoint Matching (ME-AM), a new framework designed to improve offline reinforcement learning. This method addresses limitations in existing approaches, such as popularity bias and support binding, by incorporating entropy maximization and a mixture behavior prior. ME-AM aims to enable agents to learn optimal policies from offline datasets more effectively, even in low-density regions, and explore out-of-distribution areas for higher rewards. AI

IMPACT Introduces a novel framework to improve the learning capabilities of agents in offline reinforcement learning scenarios.
RESEARCH · arXiv stat.ML English(EN) · 1mo · [2 sources]

Dynamic Treatment on Networks

Researchers have developed Q-Ising, a novel three-stage pipeline for dynamic treatment allocation in networks. This method integrates network structure with dynamic treatment strategies, addressing limitations of existing approaches. Q-Ising estimates network adoption dynamics using a Bayesian dynamic Ising model, augments treatment histories with latent states, and learns a dynamic policy through offline reinforcement learning. The approach quantifies uncertainty in dynamic decisions and provides interpretable spillover estimates, demonstrating superior performance over static benchmarks in microfinance network data. AI

IMPACT Introduces a new framework for optimizing interventions in networked systems, potentially improving public health and economic strategies.
TOOL · arXiv cs.LG English(EN) · 1mo

AdamO: A Collapse-Suppressed Optimizer for Offline RL

Researchers have introduced AdamO, a novel optimizer designed to enhance stability in offline reinforcement learning. This new optimizer addresses the issue of 'collapse,' where errors in temporal-difference updates can lead to extreme and unusable Q-values. AdamO incorporates orthogonality constraints to prevent the amplification of TD errors, theoretically guaranteeing task safety while maintaining the continuous-time dissipative dynamics of Adam. Empirical results show that AdamO improves stability and performance across various offline RL benchmarks when integrated with existing baselines. AI

IMPACT Introduces a new optimizer that improves stability and performance in offline reinforcement learning tasks.

Brief

Model-based Bootstrap of Controlled Markov Chains

Entropy-Regularized Adjoint Matching for Offline RL

Dynamic Treatment on Networks

AdamO: A Collapse-Suppressed Optimizer for Offline RL