reinforcement learning
PulseAugur coverage of reinforcement learning — every cluster mentioning reinforcement learning across labs, papers, and developer communities, ranked by signal.
- used by robotics 80%
- used by Large Language Models 70%
- used by Group Relative Policy Optimization 70%
- used by train of thought 70%
- instance of Markov decision process 70%
- affiliated with supervised fine-tuning 70%
- instance of robotics 60%
- used by Markov decision process 60%
- other supervised fine-tuning 60%
7 day(s) with sentiment data
-
Researchers use RL to improve MLLM regression on imbalanced data
Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
-
New Omni-Fake dataset benchmarks multimodal deepfake detection on social media
Researchers have introduced Omni-Fake, a new benchmark dataset designed to improve the detection of multimodal deepfakes on social media. The dataset includes over 1 million samples across image, audio, video, and audio…
-
Infoprop Dyna enables Mini Wheelbot to learn racing in 11 minutes
Researchers have developed a new reinforcement learning framework called Infoprop Dyna that allows robots to learn complex tasks directly from real-world interactions, bypassing the need for traditional physics-based si…
-
AI research integrates reward shaping with control functions for safer UAV navigation
Researchers have developed a novel approach for Unmanned Aerial Vehicle (UAV) navigation that combines reinforcement learning with control Lyapunov and barrier functions. This method aims to improve both mission efficie…
-
RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts
Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
-
ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
Researchers have developed ELVIS, a novel approach to long-horizon visual planning in reinforcement learning that uses a Gaussian-mixture model predictive controller to maintain multiple hypotheses over extended rollout…
-
New DGPO framework improves LLM reasoning credit assignment
Researchers have introduced Distribution Guided Policy Optimization (DGPO), a new reinforcement learning framework designed to improve how large language models handle complex reasoning tasks. Current methods struggle w…
-
AI framework optimizes resource-constrained outbreak control using hierarchical reinforcement learning
Researchers have developed a hierarchical reinforcement learning framework to optimize the allocation of limited resources for controlling infectious disease outbreaks across multiple clusters. This approach uses a glob…
-
New LEGIT dataset evaluates LLM legal reasoning with issue tree rubrics
Researchers have developed LEGIT, a new dataset containing 24,000 legal reasoning instances designed to evaluate the quality of LLM-generated legal arguments. This dataset converts court judgments into hierarchical tree…
-
UAV navigation enhanced with RL, safety functions
Researchers have developed a novel approach for autonomous UAV navigation that enhances both speed and safety. This method combines reinforcement learning with potential-based reward shaping, control Lyapunov functions,…
-
New research advances bandit algorithms for control, causality, and multi-objective learning
Multiple research papers explore advancements in bandit algorithms across various domains. One study introduces a machine learning framework for optimal control of fluid restless multi-armed bandit problems, achieving s…
-
AI game teaches cybersecurity defense through interactive Q&A
Researchers have developed a novel educational framework called the Explainable Q20 Cybersecurity Recommender (EQ-20CR) that uses a game-inspired approach to teach cybersecurity. The system employs a reinforcement learn…
-
AutoREC platform uses RL agents to generate circuit models from EIS data
Researchers have developed AutoREC, an open-source Python package designed to automate the generation of equivalent circuit models (ECMs) from electrochemical impedance spectroscopy (EIS) data. This platform utilizes re…
-
OpAgent achieves 71.6% success rate in web navigation tasks
Researchers have developed OpAgent, a novel web navigation agent that utilizes online reinforcement learning to overcome the limitations of static datasets. The agent employs a hierarchical multi-task fine-tuning approa…
-
Transformer RL optimizes 6G network function chain partitioning
Researchers have developed a new Transformer-based actor-critic reinforcement learning framework to address the challenges of partitioning Service Function Chains (SFCs) in future 6G networks. This approach utilizes sel…
-
FiLMMeD model uses Feature-wise Linear Modulation for multi-depot vehicle routing
Researchers have introduced FiLMMeD, a novel neural network model designed to tackle various multi-depot vehicle routing problems (MDVRP). This model enhances generalization by incorporating Feature-wise Linear Modulati…
-
New paper derives exponential family results from single KL identity
Researchers have identified a fundamental identity for exponential families, which are distributions crucial to modern machine learning techniques like softmax and Gaussian distributions. This identity simplifies the de…
-
New Kernelized Advantage Estimation improves LLM reasoning with nonparametric statistics
Researchers have introduced Kernelized Advantage Estimation (KAE) to enhance the reasoning capabilities of large language models (LLMs) through reinforcement learning. KAE addresses limitations in existing methods like …
-
DORA system accelerates LLM reinforcement learning by 2-4x with novel asynchronous rollout
Researchers have developed DORA, a novel asynchronous reinforcement learning system designed to accelerate language model training. DORA addresses the bottleneck caused by long-tailed trajectories in the rollout phase b…
-
Surveys explore robot learning from human videos and world models, while new networks tackle driver monitoring.
Two new survey papers explore advancements in robot learning, focusing on different data acquisition and utilization strategies. One paper provides a comprehensive review of world models, which are predictive representa…