ENTITY reinforcement learning

reinforcement learning

PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.

Total · 30d

119

119 over 90d

Releases · 30d

0 over 90d

Papers · 30d

115

115 over 90d

TIER MIX · 90D

significant 2
research 44
tool 71
commentary 1
meme 1

RELATIONSHIPS

SENTIMENT · 30D

7 day(s) with sentiment data

RECENT · PAGE 2/5 · 94 TOTAL

TOOL · CL_21905 · May 8 · 04:00

New RL paradigm internalizes outcome supervision for reasoning

Researchers have introduced a novel paradigm for reinforcement learning in reasoning tasks, aiming to overcome the limitations of sparse outcome-level supervision. Their proposed method focuses on internalizing outcome …
TOOL · CL_22473 · May 8 · 04:00

New Long-Horizon Q-Learning method improves reinforcement learning accuracy

Researchers have introduced Long-Horizon Q-Learning (LQL), a novel method designed to improve the stability of value-based reinforcement learning. LQL addresses the issue of compounding estimation errors in traditional …
TOOL · CL_22097 · May 8 · 04:00

PlatoLTL enables RL agents to generalize across unseen symbols in LTL instructions

Researchers have introduced PlatoLTL, a new method designed to improve generalization in multi-task reinforcement learning. This approach enables RL agents to perform tasks not encountered during training, specifically …
TOOL · CL_22082 · May 8 · 04:00

New theory explains RLVR optimization dynamics and step-size thresholds

Researchers have developed a theoretical framework for Reinforcement Learning with Verifiable Rewards (RLVR), a technique used to fine-tune large language models with binary feedback. The study introduces a 'Gradient Ga…
RESEARCH · CL_22004 · May 8 · 04:00

Reinforcement learning optimizes genetic circuit design under uncertainty

Researchers have developed a new sequential framework utilizing reinforcement learning to optimize the design of genetic circuits, addressing uncertainties inherent in biological systems. This approach employs simulator…
RESEARCH · CL_21952 · May 8 · 04:00

New methods enhance on-policy distillation for LLMs

Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…
TOOL · CL_21943 · May 8 · 04:00

New Gradient-Momentum Coupling metric enhances reinforcement learning progress measurement

Researchers have introduced Gradient-Momentum Coupling (GMC), a novel method for measuring learning progress in reinforcement learning. GMC quantifies the utility of a sample's gradient for ongoing learning by analyzing…
TOOL · CL_21940 · May 8 · 04:00

LLMs and behavior trees enhance AI agent task completion with reward shaping

Researchers have developed a novel method called Masking Reward Behavior Tree (MRBT) to enhance the learning efficiency of autonomous agents in complex, multi-step tasks. MRBT utilizes large language models (LLMs) to au…
TOOL · CL_21938 · May 8 · 04:00

Measure-theoretic theory for adaptive-data fitted Q-iteration developed

Researchers have developed a new theoretical framework for fitted Q-iteration (FQI) that bridges measure-theoretic foundations with practical error analysis in reinforcement learning. This framework provides finite-samp…
TOOL · CL_20963 · May 7 · 07:24

Fine-tuned LLM masters legal contract negotiation by knowing when to stop

Researchers developed a reinforcement learning environment to train language models for negotiating legal contracts. A smaller, fine-tuned model successfully closed a contract that a significantly larger model failed to…
TOOL · CL_20436 · May 7 · 04:00

Dream-MPC uses latent imagination for gradient-based model predictive control

Researchers have introduced Dream-MPC, a novel approach for model-based Reinforcement Learning that utilizes gradient-based optimization with latent imagination. This method generates candidate trajectories and refines …
TOOL · CL_20568 · May 7 · 04:00

RouteFormer uses transformers and RL for autonomous vehicle routing

Researchers have developed RouteFormer, a novel framework utilizing Transformer architecture and Reinforcement Learning for optimizing routing in autonomous surveillance missions. This approach addresses complex combina…
TOOL · CL_20567 · May 7 · 04:00

New research explores parallel and restart strategies for efficient stochastic simulations

Researchers have analyzed the efficiency of parallel and restart strategies for stochastic simulations in model-free settings, which are common in reinforcement learning. Their probabilistic analysis reveals an optimal …
TOOL · CL_20560 · May 7 · 04:00

New Malliavin calculus method estimates counterfactual gradients for adaptive IRL

Researchers have developed a novel passive algorithm for adaptive inverse reinforcement learning (IRL) that reconstructs a forward learner's loss function by observing its gradients. This new method utilizes Malliavin c…
TOOL · CL_19903 · May 6 · 19:06

vLLM V1 engine rewrite achieves parity with V0 after backend fixes

Hugging Face's vLLM team detailed the process of aligning their new V1 engine with the V0 reference, focusing on ensuring backend parity before addressing Reinforcement Learning (RL) objective changes. They identified a…
TOOL · CL_18768 · May 6 · 04:00

Pass-rate rewards fail to boost AI code generation, study finds

A new research paper explores the effectiveness of using pass-rate rewards in reinforcement learning for code generation tasks. The study found that while pass-rate rewards can alleviate the issue of sparse rewards, the…
TOOL · CL_18842 · May 6 · 04:00

Aura-CAPTCHA uses RL and GANs for adaptive, multi-modal bot detection

Researchers have developed Aura-CAPTCHA, a novel multi-modal verification system designed to thwart bot attacks. This system combines Generative Adversarial Networks (GANs) for visual challenges, Reinforcement Learning …
TOOL · CL_18639 · May 6 · 04:00

Category theory framework proposed for defining and comparing AGI architectures

This working paper proposes a formal framework for comparing different Artificial General Intelligence (AGI) architectures using category theory. The authors aim to provide a unified foundation for AGI systems, integrat…
RESEARCH · CL_18294 · May 5 · 15:14

New framework 'Mechanical Conscience' offers trajectory-level regulation for AI

A new paper introduces "mechanical conscience" (MC), a mathematical framework designed to regulate the behavior of intelligent systems, particularly in distributed collaborative intelligence (DCI) environments. This fra…
RESEARCH · CL_18363 · May 5 · 07:14

Quantum circuits enhance hierarchical reinforcement learning agents, saving parameters

Researchers have developed a hybrid hierarchical reinforcement learning agent that integrates variational quantum circuits into its architecture. This approach substitutes classical components with quantum circuits for …

New RL paradigm internalizes outcome supervision for reasoning

New Long-Horizon Q-Learning method improves reinforcement learning accuracy

PlatoLTL enables RL agents to generalize across unseen symbols in LTL instructions

New theory explains RLVR optimization dynamics and step-size thresholds

Reinforcement learning optimizes genetic circuit design under uncertainty

New methods enhance on-policy distillation for LLMs

New Gradient-Momentum Coupling metric enhances reinforcement learning progress measurement

LLMs and behavior trees enhance AI agent task completion with reward shaping

Measure-theoretic theory for adaptive-data fitted Q-iteration developed

Fine-tuned LLM masters legal contract negotiation by knowing when to stop

Dream-MPC uses latent imagination for gradient-based model predictive control

RouteFormer uses transformers and RL for autonomous vehicle routing

New research explores parallel and restart strategies for efficient stochastic simulations

New Malliavin calculus method estimates counterfactual gradients for adaptive IRL

vLLM V1 engine rewrite achieves parity with V0 after backend fixes

Pass-rate rewards fail to boost AI code generation, study finds

Aura-CAPTCHA uses RL and GANs for adaptive, multi-modal bot detection

Category theory framework proposed for defining and comparing AGI architectures

New framework 'Mechanical Conscience' offers trajectory-level regulation for AI

Quantum circuits enhance hierarchical reinforcement learning agents, saving parameters