reinforcement learning
PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.
- instance of SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY 95%
- used by large-language models 90%
- used by Grpo 90%
- used by Markov decision process 90%
- instance of Multi-agent reinforcement learning 90%
- instance of Very Large Array 90%
- used by large language model 90%
- used by Soft Actor--Critic 90%
- developed by large-language models 70%
- developed by Grpo 70%
- used by robotics 70%
- used by supervised fine-tuning 70%
- 2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source
25 day(s) with sentiment data
-
New CIG reward method enhances reinforcement learning exploration
Researchers have introduced Conditional Information Gain (CIG), a novel reward mechanism for reinforcement learning designed to improve exploration strategies. CIG addresses limitations of existing methods by providing …
-
AI framework optimizes aeroengine pipe design for manufacturing
Researchers have developed a new reinforcement learning framework, called FPRO, to optimize the design and manufacturing of free-form pipes in aeroengines. This approach integrates domain-specific manufacturing knowledg…
-
Mahjong RL simulator Mahjax achieves 2M steps/sec on GPUs
Researchers have developed Mahjax, a new GPU-accelerated simulator for the complex game of Riichi Mahjong, implemented in JAX. This tool is designed to facilitate reinforcement learning research, particularly for agents…
-
Reinforcement learning explained: policies, MDPs, and trajectories
This article explains how reinforcement learning agents make decisions by defining key concepts. It covers policies, Markov Decision Processes (MDPs), and trajectories. The series aims to build understanding towards the…
-
New research advances optimization and reinforcement learning theory
Researchers have developed new theoretical frameworks for optimizing decision-making processes in machine learning. One paper introduces regret-based stopping criteria for Bayesian optimization, ensuring solutions are w…
-
New RL jailbreak method exploits LRM attention patterns
Researchers have developed a new jailbreak method specifically targeting Large Reasoning Models (LRMs), which are known for their step-by-step problem-solving abilities. The method leverages reinforcement learning and i…
-
New flow matching methods enhance generative modeling and RL
Researchers are advancing flow matching techniques for generative modeling across various domains. New methods like Kinetic Path Energy (KPE) and Kinetic Trajectory Shaping (KTS) aim to improve generation quality by ana…
-
Reinforcement learning optimizes physical activity for health biomarkers
Researchers have developed a novel offline reinforcement learning algorithm to create personalized physical activity recommendations. This algorithm analyzes step count data and health biomarkers from the All of Us Rese…
-
Latent visual reasoning tokens prove non-essential for inference
Researchers have investigated the role of latent visual reasoning, a technique that incorporates visual evidence into multimodal reasoning by using continuous latent tokens before text generation. Their findings suggest…
-
DiPRL method learns discrete programmatic policies for reinforcement learning
Researchers have developed DiPRL, a novel method for learning discrete programmatic policies in reinforcement learning. This approach aims to overcome the performance degradation often seen when converting continuous pr…
-
Reinforcement learning models customer retail journeys for layout optimization
Researchers have developed a new reinforcement learning (RL) framework to model customer movement in retail environments, aiming to provide practical insights for store layout optimization. This approach treats customer…
-
New PRISM framework corrects SFT flaws in multimodal LLM training
New research from institutions including the Hong Kong University of Science and Technology (Guangzhou) reveals a critical flaw in the common post-training paradigm for multimodal large language models (MLLMs). The stan…
-
Developer uses domain randomization to train robust reinforcement learning agents
A developer has made progress in training reinforcement learning agents using domain randomization. This technique helps create more robust agents, and the developer has successfully implemented it to improve a bot's ab…
-
New OptMuon method enhances stochastic optimization with adaptive momentum
Researchers have introduced OptMuon, a novel adaptive momentum orthogonalization method for stochastic nonconvex optimization that calibrates update magnitudes from observed trajectories. This approach combines Muon-sty…
-
New method enhances vision-language models with group revision
Researchers have introduced a new group-revision optimization paradigm to improve object-level grounding in large vision-language models. This method addresses the limitations of sparse, response-level rewards in existi…
-
RL agent controls GenAI access to boost student learning
A new research paper proposes using reinforcement learning to control when students can access generative AI tools in educational settings. The study found that strategically timed access, managed by an RL agent, improv…
-
New E²PO framework enhances generative model alignment with human preference
Researchers have introduced a new framework called Embedding-perturbed Exploration Preference Optimization (E²PO) to address limitations in aligning generative models with human intent using reinforcement learning. Exis…
-
Lamarckian inheritance benefits robots in predictable, dynamic environments
Researchers have explored the impact of Lamarckian inheritance on evolutionary dynamics in dynamic environments for robotic agents. Their findings suggest that the benefit of Lamarckian inheritance, where learned traits…
-
New framework combines knowledge and RL for vehicle routing problems
Researchers have developed a new framework for solving the Capacitated Vehicle Routing Problem (CVRP), a complex logistics challenge. Their approach integrates knowledge-based heuristics with reinforcement learning, bre…
-
New framework unifies sampling and optimization problems
This paper introduces the multi-armed sampling problem, a new framework that mirrors the multi-armed bandit problem but focuses on sampling rather than optimization. Researchers have defined regret measures and establis…