Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion
Researchers have developed a modified version of the Soft Actor-Critic (SAC) algorithm that matches the performance of Proximal Policy Optimization (PPO) in training legged robots. This new approach addresses SAC's sample inefficiency by enabling it to reuse past experiences, making it suitable for sim-to-real transfer and online learning on physical hardware. The modifications include improvements to policy initialization, critic targets, and return estimation, which allow SAC to train stably at scale across various robot platforms and locomotion tasks. AI
IMPACT Enables more efficient training of legged robots, potentially accelerating sim-to-real transfer and real-time adaptation.