reinforcement learning
PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.
- instance of SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY 95%
- used by large-language models 90%
- used by Grpo 90%
- used by Markov decision process 90%
- used by large language model 90%
- used by Soft Actor--Critic 90%
- developed by large-language models 70%
- developed by Grpo 70%
- used by robotics 70%
- used by supervised fine-tuning 70%
- used by Group Relative Policy Optimization 70%
- employs Diffusion Models 70%
- 2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source
26 day(s) with sentiment data
-
Graph-GRPO enhances e-commerce search relevance with LLMs
Researchers have developed Graph-GRPO, a novel framework for improving e-commerce search relevance by leveraging large language models and reinforcement learning. This method constructs a dependency graph of reasoning s…
-
AI explores adaptive control systems in everyday tech
This cluster discusses adaptive control systems within AI, posing a question about which everyday systems best adapt to change. It highlights robotics, reinforcement learning, electrical engineering, and feedback loops …
-
Apertus LLM team seeks AI research engineers in Switzerland
The Apertus LLM team is seeking AI research engineers to join their FOSS initiative in Lausanne, Switzerland. Ideal candidates will have experience in software, data, and ML engineering, with a specific interest in post…
-
Fireworks AI details complex RL infrastructure for continuous model updates
Fireworks AI is detailing the engineering challenges and solutions involved in training large language models, particularly focusing on Reinforcement Learning (RL) from human feedback. They highlight that a product's re…
-
New method GUI-CIDER boosts GUI agent knowledge
Researchers have developed GUI-CIDER, a novel mid-training method designed to enhance the world knowledge of GUI agents built with multimodal large language models. This approach explicitly internalizes GUI operational …
-
Reinforcement learning math series explains core agent reasoning tools
Shawn Hymel's latest post in his Reinforcement Learning math series explains key concepts like expected return, state value function (v(s)), and action-value function (q(s,a)). These mathematical tools are fundamental f…
-
Reinforcement learning pioneer partners with Chinese firm on 'Robot Kindergarten'
Richard Sutton, a pioneer in reinforcement learning, has partnered with Chinese haptic technology company HeShan Technology to launch a "Robot Kindergarten" project. This initiative aims to train embodied AI agents thro…
-
New RL method improves transfer learning with Bellman alignment
Researchers have introduced a new method called One-Step Bellman Alignment (RWT) to improve transfer learning in online reinforcement learning. This technique addresses the challenge of using data from related source ta…
-
Soft synthetic snakes learn to navigate complex 3D terrains
Researchers have developed a computational framework enabling soft synthetic snakes to navigate complex 3D terrains. The system uses bio-inspired actuation and sensing models to simplify control for these high-degree-of…
-
New RL framework optimizes laser manufacturing scan orders
Researchers have developed a new framework to improve reinforcement learning for optimizing scan orders in laser additive manufacturing. This bilevel Proxy--FEA diagnostic approach uses lightweight proxies for rapid can…
-
ResDreamer model enhances RL agents with hierarchical visual reasoning
Researchers have developed ResDreamer, a novel hierarchical world model designed to improve reinforcement learning in complex 3D environments. This self-supervised approach trains layers to reconstruct residuals of the …
-
New ERPD method enhances LLM reinforcement learning
Researchers have developed Extreme Region Policy Distillation (ERPD), a novel two-stage framework for reinforcement learning in large language models. This method aims to overcome the trade-off between sample efficiency…
-
New CEDGE framework uses diffusion models for off-dynamics reinforcement learning
Researchers have developed CEDGE, a novel framework for off-dynamics reinforcement learning that utilizes diffusion models to generate synthetic trajectories. This approach trains a diffusion model on source-domain data…
-
AI Research Advances Policy Optimization for LLMs and Robotics
Researchers are developing new methods to improve policy optimization in reinforcement learning, particularly for large language models and robotics. Techniques like Physics-Guided Policy Optimization (PGPO) and Hint-Gu…
-
Reinforcement learning optimizes EV charging for lower emissions
Researchers have developed a new emission-aware reinforcement learning strategy to optimize electric vehicle charging. This approach, based on the Soft Actor Critic algorithm, prioritizes reducing carbon emissions and m…
-
AI research paper advocates for enactive perception and embodied interaction
This paper proposes integrating enactive approaches into artificial intelligence, viewing perception as an active, embodied engagement with the environment rather than passive input processing. It highlights four key en…
-
Neuro-inspired Inverter framework enhances AI planning and control
Researchers have developed a novel neuro-inspired framework called Inverter for embodied planning and control. This framework utilizes Inverse Learning (IL) to train components, bridging the gap between reinforcement le…
-
Quantum Frog game shows cooperation improves agent success
Researchers have developed a new cooperative game called Quantum Frog, inspired by Frogger, which uses a quantized-time mechanic where the environment only advances when a player acts. Using reinforcement learning, they…
-
AI Safety expert critiques Bengio's 'Scientist AI' plan
A critique of Yoshua Bengio's "Scientist AI" proposal raises concerns about its alignment failures and practical feasibility. The author argues that preventing the AI from exploring agentically, a key aspect of scientif…
-
Fireworks AI: Frontier RL infrastructure costs are lower than believed
Fireworks AI argues that the conventional wisdom regarding the cost of frontier Reinforcement Learning (RL) infrastructure is flawed. They propose that instead of transferring entire multi-terabyte model checkpoints for…