ENTITY DAPO++

DAPO++

PulseAugur coverage of DAPO++ — every cluster mentioning DAPO++ across labs, papers, and developer communities, ranked by signal.

Total · 30d

12

12 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

11

11 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL

TOOL · CL_105159 · Jun 22 · 11:51

New CFPO framework enhances multimodal reasoning in LVLMs

Researchers have introduced CounterFactual Policy Optimization (CFPO), a new framework designed to improve multimodal reasoning in Large Vision-Language Models (LVLMs). CFPO addresses grounding failures and hallucinatio…
TOOL · CL_104743 · Jun 21 · 16:14

New RLVR method ACPO enhances LLM reasoning capabilities

Researchers have analyzed Reinforcement Learning from Verifiable Rewards (RLVR) to understand its impact on large language model reasoning. Their theoretical analysis revealed that the degree of off-policy learning, inf…
TOOL · CL_93414 · Jun 16 · 04:00

New DUPL method boosts multimodal reasoning in LLMs

Researchers have introduced DUPL, a novel policy learning approach designed to enhance multimodal reasoning in large language models. This method specifically addresses the challenge of distinguishing between uncertaint…
RESEARCH · CL_91346 · Jun 15 · 00:00

New RL methods enhance LLM training stability and efficiency · 7 sources tracked

Researchers have developed several new methods to improve the stability and efficiency of reinforcement learning (RL) in large language models (LLMs). STARE addresses policy entropy collapse by reweighting token-level a…
RESEARCH · CL_65616 · Jun 1 · 13:20

New SAGC method boosts synchronous RL training efficiency

Researchers have developed a new method called Straggler-Aware Group Control (SAGC) to improve the efficiency of synchronous on-policy reinforcement learning. SAGC dynamically adjusts the training group size during oper…
RESEARCH · CL_53799 · May 27 · 04:00

New RLVR methods enhance LLM reasoning via first-token diversification and credit assignment

Two new research papers explore methods to improve Reinforcement Learning with Verifiable Rewards (RLVR) for training reasoning models. The first paper introduces REFT (Rollout Exploration with First-Token Diversificati…
TOOL · CL_53717 · May 27 · 04:00

New framework optimizes RL post-training for LLMs

A new framework called Pilot-Commit has been developed to optimize the allocation of computational resources during the post-training phase of large language models using reinforcement learning. This method addresses th…
RESEARCH · CL_51033 · May 26 · 04:00

New RLVR methods boost LLM training efficiency and data selection

Researchers are developing new methods to improve the efficiency and effectiveness of Reinforcement Learning with Verifiable Rewards (RLVR) for training Large Language Models (LLMs). Two papers introduce novel data sele…
TOOL · CL_44357 · May 22 · 15:57

Anyscale launches skill to automate LLM post-training runs

Anyscale has introduced a new Anyscale Agent Skill designed to simplify and automate the process of generating LLM post-training runs. This skill assists users in selecting the most appropriate post-training method, suc…
RESEARCH · CL_44028 · May 21 · 16:45

New method stabilizes LLM reasoning by rescuing near-boundary signals

Researchers have identified a key bottleneck in Reinforcement Learning from Verifiable Rewards (RLVR) that hinders LLM reasoning optimization. The study pinpoints rigid clipping decisions in standard hard-clipping metho…
TOOL · CL_35221 · May 17 · 03:42

New PRISM framework corrects SFT flaws in multimodal LLM training

New research from institutions including the Hong Kong University of Science and Technology (Guangzhou) reveals a critical flaw in the common post-training paradigm for multimodal large language models (MLLMs). The stan…
RESEARCH · CL_09211 · Apr 29 · 15:01

IBM releases Granite 4.1 LLMs with 512K context and Apache 2.0 license

IBM has released the Granite 4.1 family of large language models, comprising 3B, 8B, and 30B parameter versions. These models were trained on approximately 15 trillion tokens through a five-stage pre-training process th…