Soft Actor--Critic
PulseAugur coverage of Soft Actor--Critic — every cluster mentioning Soft Actor--Critic across labs, papers, and developer communities, ranked by signal.
- 2026-05-26 research_milestone Researchers introduce modifications to Soft Actor-Critic enabling it to match PPO performance for legged robot locomotion. 来源
2 天有情绪数据
-
Modified Soft Actor-Critic algorithm matches PPO performance for robot locomotion
Researchers have developed a modified version of the Soft Actor-Critic (SAC) algorithm that matches the performance of Proximal Policy Optimization (PPO) in training legged robots. This new approach addresses SAC's samp…
-
全球投资组合管理的深度强化学习框架
研究人员开发了一个深度强化学习框架,用于动态管理全球股票市场的投资组合。该系统利用软Actor-Critic算法,通过在其奖励函数中纳入交易成本、换手惩罚和多元化约束,来优化连续的投资组合权重。尽管该框架表现出潜力,尤其是在欧洲斯托克50指数和高市场不确定性时期,但在所有测试市场中,它并未持续跑赢简单的买入并持有策略。
-
Researchers fix synthetic data failures in reinforcement learning policy optimization
Researchers have identified and addressed algorithmic failures in Model-Based Policy Optimization (MBPO), a technique used in reinforcement learning. The study found that MBPO can underperform compared to other methods …
-
LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning
Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…
-
Recurrent RL improves chemotherapy control under partial patient observability
Researchers have developed a recurrent deep reinforcement learning approach to optimize chemotherapy dosing under conditions where a patient's full state is not observable. By using memory-augmented policies with LSTM a…
-
Researchers develop semi-Markov RL for city-scale EV ride-hailing
Researchers have developed a novel semi-Markov reinforcement learning approach for optimizing city-scale electric vehicle (EV) ride-hailing fleets. This method addresses complex decisions like dispatch, repositioning, a…
-
研究人员开发用于电动汽车叫车服务的半马尔可夫强化学习,提高利润并确保可行性。
研究人员开发了一种新颖的半马尔可夫强化学习方法,用于管理大规模电动汽车叫车车队。该方法确保调度、重新定位和充电决策严格遵守充电器和馈线限制等物理约束,即使在需求和出行时间不确定的情况下也是如此。该系统利用掩码执行器产生高级意图,然后通过混合整数线性规划进行投影以保证可行性。在纽约市出租车数据集模拟器上的实验表明,名为 PD--RSAC 的方法显著优于基线方法,净利润达到 122 万美元,同时避免了任何馈线限制违规。
-
AI accelerates wind farm control using reinforcement learning
Researchers have developed new reinforcement learning techniques to improve wind farm control efficiency. One method uses expert demonstrations from steady-state models to accelerate training and enhance initial perform…
-
AI uses reinforcement learning for aircraft upset recovery and collision avoidance
Researchers have developed two distinct AI systems for advanced jet trainers using reinforcement learning. One system, a Pilot Activated Recovery System (PARS), aims to enhance operational efficiency by providing AI-dri…
-
AI框架使用因果生成对抗网络、强化学习和LLM评估来预测债券收益率
研究人员开发了一个新颖的债券收益率预测框架,通过使用因果生成对抗网络(CausalGANs)和强化学习来创建合成金融数据。这种包含宏观经济变量的合成数据被用来训练一个微调的大语言模型Qwen2.5-7B,以生成交易信号和风险评估。评估结果表明,其预测性能优于现有方法,其中强化学习方法实现了0.103%的低平均绝对误差。该研究将合成数据生成、大语言模型驱动的金融预测以及基于大语言模型的评估结合起来,以实现人工智能驱动的金融决策。