ENTITY Proximal Policy Optimization

Proximal Policy Optimization

PulseAugur coverage of Proximal Policy Optimization — every cluster mentioning Proximal Policy Optimization across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

86 over 90d

Releases · 30d

0 over 90d

Papers · 30d

81 over 90d

TIER MIX · 90D

research 33
tool 51
commentary 2

TOPICS

paper 81
other 45
product 18
model release 16
safety 11
infra 9

RELATIONSHIPS

instance of Pfadfinder und Pfadfinderinnen Österreichs 90%
instance of reinforcement learning 90%
instance of deep reinforcement learning 90%
used by large-language models 90%
developed Advantage Actor-Critic 90%
used by long short-term memory 90%
used by reinforcement learning 70%
developed Grpo 70%
uses Grpo 70%
used by reinforcement learning from human feedback 70%
used by Pfadfinder und Pfadfinderinnen Österreichs 70%
instance of Direct Preference Optimization 70%

TIMELINE

2026-05-26 research_milestone A new method is proposed to stabilize reinforcement learning training by strategically dropping transitions. source

SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 1/5 · 86 TOTAL

TOOL · CL_111775 · Jun 26 · 04:00

AI policies learn cybersecurity penetration testing faster with history aggregation

Researchers have developed and evaluated reinforcement learning policies for penetration testing in cybersecurity scenarios with partial observability. They compared several Proximal Policy Optimization (PPO) variants, …
RESEARCH · CL_111264 · Jun 25 · 03:48

New research revisits action factorization for complex RL spaces · 2 sources tracked

A new research paper explores methods for handling complex action spaces in reinforcement learning, particularly those that combine discrete and continuous actions. The study analyzes various factorization techniques ac…
TOOL · CL_109462 · Jun 24 · 11:49

New model simulates tuberculosis spread in Mars colony

Researchers have developed a new model to simulate the spread of latent tuberculosis within a radiation-exposed Mars colony. The model links galactic cosmic radiation to immune competence, which in turn affects the reac…
RESEARCH · CL_107758 · Jun 23 · 12:46

New RL framework uses vision-language models for GUI agent supervision

Researchers have developed a new reinforcement learning framework for Computer-Use Agents (CUAs) that leverages autonomous vision-language evaluation for supervision. This approach addresses the challenge of obtaining s…
RESEARCH · CL_107703 · Jun 22 · 23:05

EMAgnet introduces adaptive regularization for policy gradient self-play

Researchers have developed EMAgnet, a novel parameter-space exponential moving average (EMA) regularization technique for policy gradient self-play in large games. Unlike previous methods that use a uniform distribution…
RESEARCH · CL_107869 · Jun 22 · 20:52

New research unifies PPO-Clip and KL-PPO algorithms

Researchers have demonstrated that the clipped surrogate gradient in Proximal Policy Optimization (PPO) can be precisely replicated by a Kullback-Leibler surrogate with a per-sample coefficient. This equivalence holds t…
TOOL · CL_106809 · Jun 22 · 17:59

CoorDex enables humanoid robots to manipulate objects while walking

Researchers have developed CoorDex, a new learning pipeline designed to enable dexterous humanoid robots to perform manipulation tasks while in motion. This system converts high-dimensional body and hand control into co…
TOOL · CL_103645 · Jun 22 · 09:48

Humanoid robot 'cerebellum' gets GPT-style model with 2B frames of motion data

Researchers have introduced AstraBrain-WBC 0.5, a novel GPT-style foundational model designed for humanoid robot general cerebellum control. This model leverages a massive dataset of 2 billion frames of human motion dat…
TOOL · CL_99100 · Jun 18 · 18:36

RLAIF and PPO: Key Techniques for Enhancing LLM Behavior

This article explores Reinforcement Learning from AI Feedback (RLAIF) and Proximal Policy Optimization (PPO) as key techniques for improving large language model behavior. It details how a combination of a reward model,…
RESEARCH · CL_99596 · Jun 18 · 11:07

New AI method optimizes additive manufacturing with attention-based RL

Researchers have developed a novel approach to optimize additive manufacturing processes by integrating a multi-head attention mechanism with the Soft Actor-Critic (SAC) algorithm. This method addresses limitations in t…
TOOL · CL_98217 · Jun 18 · 04:00

Graph RL router boosts quantum circuit fidelity using calibration data

Researchers have developed a new quantum circuit routing method using graph reinforcement learning that incorporates calibration data from quantum processors. This approach, trained with proximal policy optimization and…
RESEARCH · CL_106805 · Jun 18 · 00:00

New methods enhance VLA model efficiency and performance in robotics · 9 sources tracked

Researchers are developing new methods to improve the efficiency and performance of Vision-Language-Action (VLA) models in robotics. One approach, Flow Policy Optimization (FPO), uses reinforcement learning to fine-tune…
RESEARCH · CL_99607 · Jun 18 · 00:00

New research explores RL advancements for LLMs and AI agents · 8 sources tracked

Multiple research papers released on arXiv explore advancements in reinforcement learning (RL) for large language models (LLMs) and other AI agents. One paper introduces RiVER, a framework for training LLMs on score-bas…
RESEARCH · CL_98173 · Jun 17 · 13:43

Model-free RL controllers enhance cyber-physical system resilience against attacks · arXiv paper

A new research paper published on arXiv explores the effectiveness of model-free reinforcement learning (RL) controllers in enhancing the resilience of cyber-physical systems against cyberattacks. The study analyzes fou…
TOOL · CL_97461 · Jun 17 · 12:35

SIQ-1 fine-tune of Qwen3.6 shows Opus-like reasoning, beats GPT-5.5

A new model, SIQ-1, has been developed by fine-tuning Qwen-35B-A3 using PPO. This model demonstrates strong performance on autoresearch tasks, outperforming GLM-5.2 and Qwen-350B, with its generated ideas reportedly com…
TOOL · CL_96223 · Jun 17 · 04:00

Mamba and PPO achieve superior safety in spacecraft control

A new research paper explores the effectiveness of various recurrent neural network architectures and reinforcement learning algorithms for adaptive safety-critical control in spacecraft proximity operations. The study …
RESEARCH · CL_96176 · Jun 17 · 04:00

New pipeline enables humanoid robots to manipulate objects while walking

Researchers have developed CoorDex, a novel learning pipeline that enables humanoid robots to perform dexterous manipulation while in motion. This system converts high-dimensional body and hand control into coordinated …
TOOL · CL_93441 · Jun 16 · 04:00

New RL framework learns graph partitioning with structural priors

Researchers have developed RIDGECUT, a novel reinforcement learning framework designed for graph partitioning problems, specifically targeting the Normalized Cut problem. This method incorporates domain knowledge by con…
COMMENTARY · CL_92899 · Jun 16 · 01:08

AI Alignment: RLHF, DPO, IPO, and KTO Tradeoffs Explored

The choice of AI model alignment method—RLHF, DPO, IPO, or KTO—significantly impacts project timelines and resource allocation. RLHF, a multi-stage process involving a reward model and PPO, is compute-intensive and can …
RESEARCH · CL_93059 · Jun 15 · 15:47

AI estimates food material properties using reinforcement learning

Researchers have developed a novel approach using latent space reinforcement learning to estimate material properties in food fracture simulations, specifically demonstrated with orange peeling. This method trains a goa…

AI policies learn cybersecurity penetration testing faster with history aggregation

New research revisits action factorization for complex RL spaces · 2 sources tracked

New model simulates tuberculosis spread in Mars colony

New RL framework uses vision-language models for GUI agent supervision

EMAgnet introduces adaptive regularization for policy gradient self-play

New research unifies PPO-Clip and KL-PPO algorithms

CoorDex enables humanoid robots to manipulate objects while walking

Humanoid robot 'cerebellum' gets GPT-style model with 2B frames of motion data

RLAIF and PPO: Key Techniques for Enhancing LLM Behavior

New AI method optimizes additive manufacturing with attention-based RL

Graph RL router boosts quantum circuit fidelity using calibration data

New methods enhance VLA model efficiency and performance in robotics · 9 sources tracked

New research explores RL advancements for LLMs and AI agents · 8 sources tracked

Model-free RL controllers enhance cyber-physical system resilience against attacks · arXiv paper

SIQ-1 fine-tune of Qwen3.6 shows Opus-like reasoning, beats GPT-5.5

Mamba and PPO achieve superior safety in spacecraft control

New pipeline enables humanoid robots to manipulate objects while walking

New RL framework learns graph partitioning with structural priors

AI Alignment: RLHF, DPO, IPO, and KTO Tradeoffs Explored

AI estimates food material properties using reinforcement learning