Soft Actor--Critic
PulseAugur coverage of Soft Actor--Critic — every cluster mentioning Soft Actor--Critic across labs, papers, and developer communities, ranked by signal.
- 2026-05-26 research_milestone Researchers introduce modifications to Soft Actor-Critic enabling it to match PPO performance for legged robot locomotion. source
10 day(s) with sentiment data
-
New hybrid controller enhances microrobotic cell manipulation in fluid flow
Researchers have developed a novel hybrid controller for microrobotic cell manipulation in fluid environments. This controller combines a model predictive control (MPC) system with a reinforcement learning (RL) policy t…
-
New AI method optimizes additive manufacturing with attention-based RL
Researchers have developed a novel approach to optimize additive manufacturing processes by integrating a multi-head attention mechanism with the Soft Actor-Critic (SAC) algorithm. This method addresses limitations in t…
-
New DRL Framework Optimizes Urban EV Fleet Control
Researchers have developed a new framework for controlling urban electric vehicle (EV) fleets that uses distributionally robust reinforcement learning (DRL) to handle uncertain demand and travel times. This approach, ca…
-
New LLM Training Methods Optimize Data Scheduling for Efficiency and Performance
Researchers have developed new methods for optimizing the training of large language models (LLMs) through advanced data scheduling techniques. One approach, the Holistic Data Scheduler (HDS), uses multi-objective reinf…
-
Quantum Circuits Enhance Financial Reinforcement Learning Stability
Researchers have developed FPQC-SAC, a novel variant of the Soft Actor-Critic (SAC) algorithm designed to improve stability in financial reinforcement learning tasks with low signal-to-noise ratios. This method incorpor…
-
New RL framework trains autonomous superbikes with self-paced learning
Researchers have developed a new framework for training autonomous agents to race superbikes in a simulated environment. This approach combines Soft Actor-Critic (SAC) with Self-Paced curriculum Deep Reinforcement Learn…
-
Transformer critic boosts reinforcement learning for long-horizon tasks
Researchers have developed a new sequence-conditioned critic for Soft Actor-Critic (SAC) that uses a lightweight Transformer to model trajectory context. This approach integrates N-step returns without importance sampli…
-
New RL algorithm optimizes stock trade execution
Researchers have developed a new reinforcement learning algorithm called TT-DAC-PS for optimizing stock trade execution. This deterministic actor-critic architecture incorporates several advanced techniques, including t…
-
AI robot masters air hockey using only simulator training
Researchers have developed an AI robot capable of playing air hockey against humans without any real-world practice, relying solely on simulator training. The project, a graduate thesis from the University of British Co…
-
New RL algorithm adds stability guarantees for physical systems
Researchers have developed a new reinforcement learning algorithm called LC-SAC, designed to provide stability guarantees for safety-critical physical systems. This algorithm integrates Lyapunov stability theory with So…
-
HVAC control costs quantified, replay buffer bias identified
Researchers have quantified the minimum achievable energy cost for HVAC control using Soft Actor-Critic (SAC) on a building simulator, finding it to be $35.51 per day. They identified that initializing the replay buffer…
-
Reinforcement learning uses dynamic entropy tuning for better quadcopter control
Researchers have investigated the impact of dynamic entropy tuning in reinforcement learning for quadcopter control. They compared stochastic policies, which optimize a probability distribution over actions, against det…
-
Quadrotor control system uses Soft Actor-Critic for improved performance
Researchers have developed a novel control system for quadrotors utilizing a Reinforcement Learning (RL) approach, specifically the Soft Actor-Critic (SAC) algorithm. This method focuses on controlling the quadrotor's t…
-
New PIRS method enhances building energy management with physics-informed rewards
Researchers have developed PIRS (Physics-Informed Reward Shaping), a novel method for optimizing building energy management using deep reinforcement learning. PIRS replaces ad-hoc comfort proxies with the ISO 7730 Predi…
-
LLM framework OccuReward enhances demographic equity in building energy management
Researchers have developed OccuReward, a framework that uses LLMs to shape reward functions for energy management in grid-interactive buildings, aiming to improve demographic equity. The system utilizes the Gemini API t…
-
Modified Soft Actor-Critic algorithm matches PPO performance for robot locomotion
Researchers have developed a modified version of the Soft Actor-Critic (SAC) algorithm that matches the performance of Proximal Policy Optimization (PPO) in training legged robots. This new approach addresses SAC's samp…
-
Deep Learning Frameworks Enhance Portfolio Optimization Strategies
Researchers are developing advanced deep learning frameworks for portfolio optimization, aiming to improve financial market performance. One approach uses neural networks to directly optimize financial metrics like Shar…
-
Researchers fix synthetic data failures in reinforcement learning policy optimization
Researchers have identified and addressed algorithmic failures in Model-Based Policy Optimization (MBPO), a technique used in reinforcement learning. The study found that MBPO can underperform compared to other methods …
-
LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning
Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…
-
Recurrent RL improves chemotherapy control under partial patient observability
Researchers have developed a recurrent deep reinforcement learning approach to optimize chemotherapy dosing under conditions where a patient's full state is not observable. By using memory-augmented policies with LSTM a…