PulseAugur
EN
LIVE 12:09:48
ENTITY DAPO++

DAPO++

PulseAugur coverage of DAPO++ — every cluster mentioning DAPO++ across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
12
12 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
11
11 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL
  1. TOOL · CL_105159 ·

    New CFPO framework enhances multimodal reasoning in LVLMs

    Researchers have introduced CounterFactual Policy Optimization (CFPO), a new framework designed to improve multimodal reasoning in Large Vision-Language Models (LVLMs). CFPO addresses grounding failures and hallucinatio…

  2. TOOL · CL_104743 ·

    New RLVR method ACPO enhances LLM reasoning capabilities

    Researchers have analyzed Reinforcement Learning from Verifiable Rewards (RLVR) to understand its impact on large language model reasoning. Their theoretical analysis revealed that the degree of off-policy learning, inf…

  3. TOOL · CL_93414 ·

    New DUPL method boosts multimodal reasoning in LLMs

    Researchers have introduced DUPL, a novel policy learning approach designed to enhance multimodal reasoning in large language models. This method specifically addresses the challenge of distinguishing between uncertaint…

  4. RESEARCH · CL_91346 ·

    New RL methods enhance LLM training stability and efficiency · 7 sources tracked

    Researchers have developed several new methods to improve the stability and efficiency of reinforcement learning (RL) in large language models (LLMs). STARE addresses policy entropy collapse by reweighting token-level a…

  5. RESEARCH · CL_65616 ·

    New SAGC method boosts synchronous RL training efficiency

    Researchers have developed a new method called Straggler-Aware Group Control (SAGC) to improve the efficiency of synchronous on-policy reinforcement learning. SAGC dynamically adjusts the training group size during oper…

  6. RESEARCH · CL_53799 ·

    New RLVR methods enhance LLM reasoning via first-token diversification and credit assignment

    Two new research papers explore methods to improve Reinforcement Learning with Verifiable Rewards (RLVR) for training reasoning models. The first paper introduces REFT (Rollout Exploration with First-Token Diversificati…

  7. TOOL · CL_53717 ·

    New framework optimizes RL post-training for LLMs

    A new framework called Pilot-Commit has been developed to optimize the allocation of computational resources during the post-training phase of large language models using reinforcement learning. This method addresses th…

  8. RESEARCH · CL_51033 ·

    New RLVR methods boost LLM training efficiency and data selection

    Researchers are developing new methods to improve the efficiency and effectiveness of Reinforcement Learning with Verifiable Rewards (RLVR) for training Large Language Models (LLMs). Two papers introduce novel data sele…

  9. TOOL · CL_44357 ·

    Anyscale launches skill to automate LLM post-training runs

    Anyscale has introduced a new Anyscale Agent Skill designed to simplify and automate the process of generating LLM post-training runs. This skill assists users in selecting the most appropriate post-training method, suc…

  10. RESEARCH · CL_44028 ·

    New method stabilizes LLM reasoning by rescuing near-boundary signals

    Researchers have identified a key bottleneck in Reinforcement Learning from Verifiable Rewards (RLVR) that hinders LLM reasoning optimization. The study pinpoints rigid clipping decisions in standard hard-clipping metho…

  11. TOOL · CL_35221 ·

    New PRISM framework corrects SFT flaws in multimodal LLM training

    New research from institutions including the Hong Kong University of Science and Technology (Guangzhou) reveals a critical flaw in the common post-training paradigm for multimodal large language models (MLLMs). The stan…

  12. RESEARCH · CL_09211 ·

    IBM releases Granite 4.1 LLMs with 512K context and Apache 2.0 license

    IBM has released the Granite 4.1 family of large language models, comprising 3B, 8B, and 30B parameter versions. These models were trained on approximately 15 trillion tokens through a five-stage pre-training process th…