reinforcement learning
PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.
- instance of SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY 95%
- used by large-language models 90%
- used by Grpo 90%
- used by Markov decision process 90%
- instance of Multi-agent reinforcement learning 90%
- instance of Very Large Array 90%
- used by large language model 90%
- used by Soft Actor--Critic 90%
- developed by large-language models 70%
- developed by Grpo 70%
- used by robotics 70%
- used by supervised fine-tuning 70%
- 2026-05-18 research_milestone A new paper proposes a reinforcement learning framework for modeling customer trajectories in retail. source
25 day(s) with sentiment data
-
ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
Researchers have developed ELVIS, a novel approach to long-horizon visual planning in reinforcement learning that uses a Gaussian-mixture model predictive controller to maintain multiple hypotheses over extended rollout…
-
RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts
Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
-
AI research integrates reward shaping with control functions for safer UAV navigation
Researchers have developed a novel approach for Unmanned Aerial Vehicle (UAV) navigation that combines reinforcement learning with control Lyapunov and barrier functions. This method aims to improve both mission efficie…
-
Infoprop Dyna enables Mini Wheelbot to learn racing in 11 minutes
Researchers have developed a new reinforcement learning framework called Infoprop Dyna that allows robots to learn complex tasks directly from real-world interactions, bypassing the need for traditional physics-based si…
-
Researchers use RL to improve MLLM regression on imbalanced data
Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
-
New Omni-Fake dataset benchmarks multimodal deepfake detection on social media
Researchers have introduced Omni-Fake, a new benchmark dataset designed to improve the detection of multimodal deepfakes on social media. The dataset includes over 1 million samples across image, audio, video, and audio…
-
New LEGIT dataset evaluates LLM legal reasoning with issue tree rubrics
Researchers have developed LEGIT, a new dataset containing 24,000 legal reasoning instances designed to evaluate the quality of LLM-generated legal arguments. This dataset converts court judgments into hierarchical tree…
-
AI framework optimizes resource-constrained outbreak control using hierarchical reinforcement learning
Researchers have developed a hierarchical reinforcement learning framework to optimize the allocation of limited resources for controlling infectious disease outbreaks across multiple clusters. This approach uses a glob…
-
UAV navigation enhanced with RL, safety functions
Researchers have developed a novel approach for autonomous UAV navigation that enhances both speed and safety. This method combines reinforcement learning with potential-based reward shaping, control Lyapunov functions,…
-
New research advances bandit algorithms for control, causality, and multi-objective learning
Multiple research papers explore advancements in bandit algorithms across various domains. One study introduces a machine learning framework for optimal control of fluid restless multi-armed bandit problems, achieving s…
-
AutoREC platform uses RL agents to generate circuit models from EIS data
Researchers have developed AutoREC, an open-source Python package designed to automate the generation of equivalent circuit models (ECMs) from electrochemical impedance spectroscopy (EIS) data. This platform utilizes re…
-
Transformer RL optimizes 6G network function chain partitioning
Researchers have developed a new Transformer-based actor-critic reinforcement learning framework to address the challenges of partitioning Service Function Chains (SFCs) in future 6G networks. This approach utilizes sel…
-
OpAgent achieves 71.6% success rate in web navigation tasks
Researchers have developed OpAgent, a novel web navigation agent that utilizes online reinforcement learning to overcome the limitations of static datasets. The agent employs a hierarchical multi-task fine-tuning approa…
-
AI game teaches cybersecurity defense through interactive Q&A
Researchers have developed a novel educational framework called the Explainable Q20 Cybersecurity Recommender (EQ-20CR) that uses a game-inspired approach to teach cybersecurity. The system employs a reinforcement learn…
-
FiLMMeD model uses Feature-wise Linear Modulation for multi-depot vehicle routing
Researchers have introduced FiLMMeD, a novel neural network model designed to tackle various multi-depot vehicle routing problems (MDVRP). This model enhances generalization by incorporating Feature-wise Linear Modulati…
-
New paper derives exponential family results from single KL identity
Researchers have identified a fundamental identity for exponential families, which are distributions crucial to modern machine learning techniques like softmax and Gaussian distributions. This identity simplifies the de…
-
New Kernelized Advantage Estimation improves LLM reasoning with nonparametric statistics
Researchers have introduced Kernelized Advantage Estimation (KAE) to enhance the reasoning capabilities of large language models (LLMs) through reinforcement learning. KAE addresses limitations in existing methods like …
-
Surveys explore robot learning from human videos and world models, while new networks tackle driver monitoring.
Two new survey papers explore advancements in robot learning, focusing on different data acquisition and utilization strategies. One paper provides a comprehensive review of world models, which are predictive representa…
-
DORA system accelerates LLM reinforcement learning by 2-4x with novel asynchronous rollout
Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase b…
-
New UPSi filter enhances safety in reinforcement learning with uncertainty quantification
Researchers have developed the Uncertainty-Aware Predictive Safety Filter (UPSi), a novel approach to enhance safety during reinforcement learning exploration. UPSi integrates probabilistic ensemble neural networks with…