ENTITY reinforcement learning

reinforcement learning

PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

234

234 over 90d

Releases · 30d

0 over 90d

Papers · 30d

223

223 over 90d

TIER MIX · 90D

significant 2
research 87
tool 137
commentary 8

TOPICS

paper 223
other 118
model release 50
safety 42
product 37
infra 13
opinion 2
funding 2

RELATIONSHIPS

instance of SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY 95%
used by large-language models 90%
used by Grpo 90%
used by Markov decision process 90%
instance of Multi-agent reinforcement learning 90%
instance of Very Large Array 90%
used by large language model 90%
used by Soft Actor--Critic 90%
developed by large-language models 70%
developed by Grpo 70%
used by robotics 70%
used by supervised fine-tuning 70%

TIMELINE

2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source

SENTIMENT · 30D

25 day(s) with sentiment data

RECENT · PAGE 7/10 · 200 TOTAL

TOOL · CL_41868 · May 20 · 08:15

New CIG reward method enhances reinforcement learning exploration

Researchers have introduced Conditional Information Gain (CIG), a novel reward mechanism for reinforcement learning designed to improve exploration strategies. CIG addresses limitations of existing methods by providing …
RESEARCH · CL_41798 · May 20 · 03:07

AI framework optimizes aeroengine pipe design for manufacturing

Researchers have developed a new reinforcement learning framework, called FPRO, to optimize the design and manufacturing of free-form pipes in aeroengines. This approach integrates domain-specific manufacturing knowledg…
RESEARCH · CL_42791 · May 20 · 00:33

Mahjong RL simulator Mahjax achieves 2M steps/sec on GPUs

Researchers have developed Mahjax, a new GPU-accelerated simulator for the complex game of Riichi Mahjong, implemented in JAX. This tool is designed to facilitate reinforcement learning research, particularly for agents…
TOOL · CL_39391 · May 19 · 17:30

Reinforcement learning explained: policies, MDPs, and trajectories

This article explains how reinforcement learning agents make decisions by defining key concepts. It covers policies, Markov Decision Processes (MDPs), and trajectories. The series aims to build understanding towards the…
RESEARCH · CL_39995 · May 19 · 12:39

New research advances optimization and reinforcement learning theory

Researchers have developed new theoretical frameworks for optimizing decision-making processes in machine learning. One paper introduces regret-based stopping criteria for Bayesian optimization, ensuring solutions are w…
TOOL · CL_41182 · May 19 · 07:36

New RL jailbreak method exploits LRM attention patterns

Researchers have developed a new jailbreak method specifically targeting Large Reasoning Models (LRMs), which are known for their step-by-step problem-solving abilities. The method leverages reinforcement learning and i…
RESEARCH · CL_39980 · May 19 · 03:33

New flow matching methods enhance generative modeling and RL

Researchers are advancing flow matching techniques for generative modeling across various domains. New methods like Kinetic Path Energy (KPE) and Kinetic Trajectory Shaping (KTS) aim to improve generation quality by ana…
RESEARCH · CL_39989 · May 19 · 00:17

Reinforcement learning optimizes physical activity for health biomarkers

Researchers have developed a novel offline reinforcement learning algorithm to create personalized physical activity recommendations. This algorithm analyzes step count data and health biomarkers from the All of Us Rese…
TOOL · CL_38815 · May 18 · 16:46

Latent visual reasoning tokens prove non-essential for inference

Researchers have investigated the role of latent visual reasoning, a technique that incorporates visual evidence into multimodal reasoning by using continuous latent tokens before text generation. Their findings suggest…
TOOL · CL_38262 · May 18 · 15:01

DiPRL method learns discrete programmatic policies for reinforcement learning

Researchers have developed DiPRL, a novel method for learning discrete programmatic policies in reinforcement learning. This approach aims to overcome the performance degradation often seen when converting continuous pr…
TOOL · CL_38270 · May 18 · 14:17

Reinforcement learning models customer retail journeys for layout optimization

Researchers have developed a new reinforcement learning (RL) framework to model customer movement in retail environments, aiming to provide practical insights for store layout optimization. This approach treats customer…
TOOL · CL_35221 · May 17 · 03:42

New PRISM framework corrects SFT flaws in multimodal LLM training

New research from institutions including the Hong Kong University of Science and Technology (Guangzhou) reveals a critical flaw in the common post-training paradigm for multimodal large language models (MLLMs). The stan…
TOOL · CL_34696 · May 16 · 15:18

Developer uses domain randomization to train robust reinforcement learning agents

A developer has made progress in training reinforcement learning agents using domain randomization. This technique helps create more robust agents, and the developer has successfully implemented it to improve a bot's ab…
RESEARCH · CL_36602 · May 15 · 14:50

New OptMuon method enhances stochastic optimization with adaptive momentum

Researchers have introduced OptMuon, a novel adaptive momentum orthogonalization method for stochastic nonconvex optimization that calibrates update magnitudes from observed trajectories. This approach combines Muon-sty…
TOOL · CL_36050 · May 15 · 13:41

New method enhances vision-language models with group revision

Researchers have introduced a new group-revision optimization paradigm to improve object-level grounding in large vision-language models. This method addresses the limitations of sparse, response-level rewards in existi…
TOOL · CL_36969 · May 15 · 11:02

RL agent controls GenAI access to boost student learning

A new research paper proposes using reinforcement learning to control when students can access generative AI tools in educational settings. The study found that strategically timed access, managed by an RL agent, improv…
TOOL · CL_36068 · May 15 · 09:56

New E²PO framework enhances generative model alignment with human preference

Researchers have introduced a new framework called Embedding-perturbed Exploration Preference Optimization (E²PO) to address limitations in aligning generative models with human intent using reinforcement learning. Exis…
TOOL · CL_36975 · May 15 · 09:26

Lamarckian inheritance benefits robots in predictable, dynamic environments

Researchers have explored the impact of Lamarckian inheritance on evolutionary dynamics in dynamic environments for robotic agents. Their findings suggest that the benefit of Lamarckian inheritance, where learned traits…
TOOL · CL_33404 · May 14 · 06:05

New framework combines knowledge and RL for vehicle routing problems

Researchers have developed a new framework for solving the Capacitated Vehicle Routing Problem (CVRP), a complex logistics challenge. Their approach integrates knowledge-based heuristics with reinforcement learning, bre…
TOOL · CL_30955 · May 14 · 04:00

New framework unifies sampling and optimization problems

This paper introduces the multi-armed sampling problem, a new framework that mirrors the multi-armed bandit problem but focuses on sampling rather than optimization. Researchers have defined regret measures and establis…

New CIG reward method enhances reinforcement learning exploration

AI framework optimizes aeroengine pipe design for manufacturing

Mahjong RL simulator Mahjax achieves 2M steps/sec on GPUs

Reinforcement learning explained: policies, MDPs, and trajectories

New research advances optimization and reinforcement learning theory

New RL jailbreak method exploits LRM attention patterns

New flow matching methods enhance generative modeling and RL

Reinforcement learning optimizes physical activity for health biomarkers

Latent visual reasoning tokens prove non-essential for inference

DiPRL method learns discrete programmatic policies for reinforcement learning

Reinforcement learning models customer retail journeys for layout optimization

New PRISM framework corrects SFT flaws in multimodal LLM training

Developer uses domain randomization to train robust reinforcement learning agents

New OptMuon method enhances stochastic optimization with adaptive momentum

New method enhances vision-language models with group revision

RL agent controls GenAI access to boost student learning

New E²PO framework enhances generative model alignment with human preference

Lamarckian inheritance benefits robots in predictable, dynamic environments

New framework combines knowledge and RL for vehicle routing problems

New framework unifies sampling and optimization problems