PulseAugur
EN
LIVE 10:07:33
ENTITY Proximal Policy Optimization

Proximal Policy Optimization

PulseAugur coverage of Proximal Policy Optimization — every cluster mentioning Proximal Policy Optimization across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
86
86 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
81
81 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-26 research_milestone A new method is proposed to stabilize reinforcement learning training by strategically dropping transitions. source
SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 1/5 · 86 TOTAL
  1. TOOL · CL_111775 ·

    AI policies learn cybersecurity penetration testing faster with history aggregation

    Researchers have developed and evaluated reinforcement learning policies for penetration testing in cybersecurity scenarios with partial observability. They compared several Proximal Policy Optimization (PPO) variants, …

  2. RESEARCH · CL_111264 ·

    New research revisits action factorization for complex RL spaces · 2 sources tracked

    A new research paper explores methods for handling complex action spaces in reinforcement learning, particularly those that combine discrete and continuous actions. The study analyzes various factorization techniques ac…

  3. TOOL · CL_109462 ·

    New model simulates tuberculosis spread in Mars colony

    Researchers have developed a new model to simulate the spread of latent tuberculosis within a radiation-exposed Mars colony. The model links galactic cosmic radiation to immune competence, which in turn affects the reac…

  4. RESEARCH · CL_107758 ·

    New RL framework uses vision-language models for GUI agent supervision

    Researchers have developed a new reinforcement learning framework for Computer-Use Agents (CUAs) that leverages autonomous vision-language evaluation for supervision. This approach addresses the challenge of obtaining s…

  5. RESEARCH · CL_107703 ·

    EMAgnet introduces adaptive regularization for policy gradient self-play

    Researchers have developed EMAgnet, a novel parameter-space exponential moving average (EMA) regularization technique for policy gradient self-play in large games. Unlike previous methods that use a uniform distribution…

  6. RESEARCH · CL_107869 ·

    New research unifies PPO-Clip and KL-PPO algorithms

    Researchers have demonstrated that the clipped surrogate gradient in Proximal Policy Optimization (PPO) can be precisely replicated by a Kullback-Leibler surrogate with a per-sample coefficient. This equivalence holds t…

  7. TOOL · CL_106809 ·

    CoorDex enables humanoid robots to manipulate objects while walking

    Researchers have developed CoorDex, a new learning pipeline designed to enable dexterous humanoid robots to perform manipulation tasks while in motion. This system converts high-dimensional body and hand control into co…

  8. TOOL · CL_103645 ·

    Humanoid robot 'cerebellum' gets GPT-style model with 2B frames of motion data

    Researchers have introduced AstraBrain-WBC 0.5, a novel GPT-style foundational model designed for humanoid robot general cerebellum control. This model leverages a massive dataset of 2 billion frames of human motion dat…

  9. TOOL · CL_99100 ·

    RLAIF and PPO: Key Techniques for Enhancing LLM Behavior

    This article explores Reinforcement Learning from AI Feedback (RLAIF) and Proximal Policy Optimization (PPO) as key techniques for improving large language model behavior. It details how a combination of a reward model,…

  10. RESEARCH · CL_99596 ·

    New AI method optimizes additive manufacturing with attention-based RL

    Researchers have developed a novel approach to optimize additive manufacturing processes by integrating a multi-head attention mechanism with the Soft Actor-Critic (SAC) algorithm. This method addresses limitations in t…

  11. TOOL · CL_98217 ·

    Graph RL router boosts quantum circuit fidelity using calibration data

    Researchers have developed a new quantum circuit routing method using graph reinforcement learning that incorporates calibration data from quantum processors. This approach, trained with proximal policy optimization and…

  12. RESEARCH · CL_106805 ·

    New methods enhance VLA model efficiency and performance in robotics · 9 sources tracked

    Researchers are developing new methods to improve the efficiency and performance of Vision-Language-Action (VLA) models in robotics. One approach, Flow Policy Optimization (FPO), uses reinforcement learning to fine-tune…

  13. RESEARCH · CL_99607 ·

    New research explores RL advancements for LLMs and AI agents · 8 sources tracked

    Multiple research papers released on arXiv explore advancements in reinforcement learning (RL) for large language models (LLMs) and other AI agents. One paper introduces RiVER, a framework for training LLMs on score-bas…

  14. RESEARCH · CL_98173 ·

    Model-free RL controllers enhance cyber-physical system resilience against attacks · arXiv paper

    A new research paper published on arXiv explores the effectiveness of model-free reinforcement learning (RL) controllers in enhancing the resilience of cyber-physical systems against cyberattacks. The study analyzes fou…

  15. TOOL · CL_97461 ·

    SIQ-1 fine-tune of Qwen3.6 shows Opus-like reasoning, beats GPT-5.5

    A new model, SIQ-1, has been developed by fine-tuning Qwen-35B-A3 using PPO. This model demonstrates strong performance on autoresearch tasks, outperforming GLM-5.2 and Qwen-350B, with its generated ideas reportedly com…

  16. TOOL · CL_96223 ·

    Mamba and PPO achieve superior safety in spacecraft control

    A new research paper explores the effectiveness of various recurrent neural network architectures and reinforcement learning algorithms for adaptive safety-critical control in spacecraft proximity operations. The study …

  17. RESEARCH · CL_96176 ·

    New pipeline enables humanoid robots to manipulate objects while walking

    Researchers have developed CoorDex, a novel learning pipeline that enables humanoid robots to perform dexterous manipulation while in motion. This system converts high-dimensional body and hand control into coordinated …

  18. TOOL · CL_93441 ·

    New RL framework learns graph partitioning with structural priors

    Researchers have developed RIDGECUT, a novel reinforcement learning framework designed for graph partitioning problems, specifically targeting the Normalized Cut problem. This method incorporates domain knowledge by con…

  19. COMMENTARY · CL_92899 ·

    AI Alignment: RLHF, DPO, IPO, and KTO Tradeoffs Explored

    The choice of AI model alignment method—RLHF, DPO, IPO, or KTO—significantly impacts project timelines and resource allocation. RLHF, a multi-stage process involving a reward model and PPO, is compute-intensive and can …

  20. RESEARCH · CL_93059 ·

    AI estimates food material properties using reinforcement learning

    Researchers have developed a novel approach using latent space reinforcement learning to estimate material properties in food fracture simulations, specifically demonstrated with orange peeling. This method trains a goa…