ENTITY Group Relative Policy Optimization (GRPO)

Group Relative Policy Optimization (GRPO)

PulseAugur coverage of Group Relative Policy Optimization (GRPO) — every cluster mentioning Group Relative Policy Optimization (GRPO) across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

8 over 90d

Releases · 30d

0 over 90d

Papers · 30d

8 over 90d

TIER MIX · 90D

TOPICS

RECENT · PAGE 1/1 · 8 TOTAL

TOOL · CL_72659 · Jun 5 · 04:00

AI models trained to express feelings, but with trade-offs

Researchers have developed a method to train large language models to express feelings, intentions, and self-awareness. This approach, called Human-like Model eXpressions of Feeling (HMX-feel), uses self-rewarded reinfo…
RESEARCH · CL_62262 · May 29 · 10:53

New FOCUS framework enhances object localization in vision models

Researchers have developed a new framework called FOCUS to improve in-context object localization in vision-language models. This method uses a two-stage training process that optimizes attention between support images …
TOOL · CL_58817 · May 29 · 04:00

New Video Anomaly Model 'CaC' Improves Detection Accuracy

Researchers have introduced Concentrate and Concentrate (CaC), a novel anomaly detection model for videos that leverages Vision-Language Models. CaC employs a coarse-to-fine approach, first identifying anomalous time wi…
TOOL · CL_38812 · May 18 · 17:50

SafeDiffusion-R1 enhances image model safety with online reward steering

Researchers have developed SafeDiffusion-R1, a new framework for enhancing the safety of diffusion models. This method utilizes an online reinforcement learning approach with Group Relative Policy Optimization (GRPO) to…
RESEARCH · CL_45016 · May 16 · 15:11

AI agents show promise in supply chains but face reliability and security risks

A new research paper explores the use of autonomous AI agents in supply chain management, demonstrating that while advanced models can significantly reduce costs, they also introduce reliability risks such as 'agent bul…
TOOL · CL_27968 · May 11 · 17:59

New SLAS method enhances text-to-image model training

Researchers have developed a new method called Super-Linear Advantage Shaping (SLAS) to improve text-to-image models trained with reinforcement learning. This technique addresses reward hacking by reshaping the policy s…
TOOL · CL_25604 · May 8 · 07:22

LoRA rank allocation fails in RL fine-tuning, study finds

A new study on the Qwen 2.5 1.5B model reveals that adaptive rank allocation techniques, effective in supervised fine-tuning, do not translate to reinforcement learning with Group Relative Policy Optimization (GRPO). Re…
TOOL · CL_26962 · May 8 · 05:37

New SRPO method enhances multimodal reasoning in vision-language models

Researchers have introduced Structured Role-aware Policy Optimization (SRPO), a novel method to enhance the reasoning abilities of large vision-language models (LVLMs). SRPO addresses the limitation of current reinforce…

AI models trained to express feelings, but with trade-offs

New FOCUS framework enhances object localization in vision models

New Video Anomaly Model 'CaC' Improves Detection Accuracy

SafeDiffusion-R1 enhances image model safety with online reward steering

AI agents show promise in supply chains but face reliability and security risks

New SLAS method enhances text-to-image model training

LoRA rank allocation fails in RL fine-tuning, study finds

New SRPO method enhances multimodal reasoning in vision-language models