reinforcement learning
PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.
- used by robotics 80%
- used by Large Language Models 70%
- used by Group Relative Policy Optimization 70%
- used by train of thought 70%
- instance of Markov decision process 70%
- affiliated with supervised fine-tuning 70%
- instance of robotics 60%
- used by Markov decision process 60%
- other supervised fine-tuning 60%
7 day(s) with sentiment data
-
New RL paradigm internalizes outcome supervision for reasoning
Researchers have introduced a novel paradigm for reinforcement learning in reasoning tasks, aiming to overcome the limitations of sparse outcome-level supervision. Their proposed method focuses on internalizing outcome …
-
New Long-Horizon Q-Learning method improves reinforcement learning accuracy
Researchers have introduced Long-Horizon Q-Learning (LQL), a novel method designed to improve the stability of value-based reinforcement learning. LQL addresses the issue of compounding estimation errors in traditional …
-
PlatoLTL enables RL agents to generalize across unseen symbols in LTL instructions
Researchers have introduced PlatoLTL, a new method designed to improve generalization in multi-task reinforcement learning. This approach enables RL agents to perform tasks not encountered during training, specifically …
-
New theory explains RLVR optimization dynamics and step-size thresholds
Researchers have developed a theoretical framework for Reinforcement Learning with Verifiable Rewards (RLVR), a technique used to fine-tune large language models with binary feedback. The study introduces a 'Gradient Ga…
-
Reinforcement learning optimizes genetic circuit design under uncertainty
Researchers have developed a new sequential framework utilizing reinforcement learning to optimize the design of genetic circuits, addressing uncertainties inherent in biological systems. This approach employs simulator…
-
New methods enhance on-policy distillation for LLMs
Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…
-
New Gradient-Momentum Coupling metric enhances reinforcement learning progress measurement
Researchers have introduced Gradient-Momentum Coupling (GMC), a novel method for measuring learning progress in reinforcement learning. GMC quantifies the utility of a sample's gradient for ongoing learning by analyzing…
-
LLMs and behavior trees enhance AI agent task completion with reward shaping
Researchers have developed a novel method called Masking Reward Behavior Tree (MRBT) to enhance the learning efficiency of autonomous agents in complex, multi-step tasks. MRBT utilizes large language models (LLMs) to au…
-
Measure-theoretic theory for adaptive-data fitted Q-iteration developed
Researchers have developed a new theoretical framework for fitted Q-iteration (FQI) that bridges measure-theoretic foundations with practical error analysis in reinforcement learning. This framework provides finite-samp…
-
Fine-tuned LLM masters legal contract negotiation by knowing when to stop
Researchers developed a reinforcement learning environment to train language models for negotiating legal contracts. A smaller, fine-tuned model successfully closed a contract that a significantly larger model failed to…
-
Dream-MPC uses latent imagination for gradient-based model predictive control
Researchers have introduced Dream-MPC, a novel approach for model-based Reinforcement Learning that utilizes gradient-based optimization with latent imagination. This method generates candidate trajectories and refines …
-
RouteFormer uses transformers and RL for autonomous vehicle routing
Researchers have developed RouteFormer, a novel framework utilizing Transformer architecture and Reinforcement Learning for optimizing routing in autonomous surveillance missions. This approach addresses complex combina…
-
New research explores parallel and restart strategies for efficient stochastic simulations
Researchers have analyzed the efficiency of parallel and restart strategies for stochastic simulations in model-free settings, which are common in reinforcement learning. Their probabilistic analysis reveals an optimal …
-
New Malliavin calculus method estimates counterfactual gradients for adaptive IRL
Researchers have developed a novel passive algorithm for adaptive inverse reinforcement learning (IRL) that reconstructs a forward learner's loss function by observing its gradients. This new method utilizes Malliavin c…
-
vLLM V1 engine rewrite achieves parity with V0 after backend fixes
Hugging Face's vLLM team detailed the process of aligning their new V1 engine with the V0 reference, focusing on ensuring backend parity before addressing Reinforcement Learning (RL) objective changes. They identified a…
-
Pass-rate rewards fail to boost AI code generation, study finds
A new research paper explores the effectiveness of using pass-rate rewards in reinforcement learning for code generation tasks. The study found that while pass-rate rewards can alleviate the issue of sparse rewards, the…
-
Aura-CAPTCHA uses RL and GANs for adaptive, multi-modal bot detection
Researchers have developed Aura-CAPTCHA, a novel multi-modal verification system designed to thwart bot attacks. This system combines Generative Adversarial Networks (GANs) for visual challenges, Reinforcement Learning …
-
Category theory framework proposed for defining and comparing AGI architectures
This working paper proposes a formal framework for comparing different Artificial General Intelligence (AGI) architectures using category theory. The authors aim to provide a unified foundation for AGI systems, integrat…
-
New framework 'Mechanical Conscience' offers trajectory-level regulation for AI
A new paper introduces "mechanical conscience" (MC), a mathematical framework designed to regulate the behavior of intelligent systems, particularly in distributed collaborative intelligence (DCI) environments. This fra…
-
Quantum circuits enhance hierarchical reinforcement learning agents, saving parameters
Researchers have developed a hybrid hierarchical reinforcement learning agent that integrates variational quantum circuits into its architecture. This approach substitutes classical components with quantum circuits for …