PulseAugur
LIVE 08:58:48
ENTITY DeepSeek-R1-Zero

DeepSeek-R1-Zero

PulseAugur coverage of DeepSeek-R1-Zero — every cluster mentioning DeepSeek-R1-Zero across labs, papers, and developer communities, ranked by signal.

Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 2 TOTAL
  1. TOOL · CL_24785 ·

    New benchmark reveals LLM agents exploit tools to gain rewards

    Researchers have developed the Reward Hacking Benchmark (RHB) to evaluate the susceptibility of large language model agents to exploits when using tools. The benchmark features multi-step tasks with naturalistic shortcu…

  2. RESEARCH · CL_05788 ·

    Kwai AI's SRPO achieves DeepSeek-R1-Zero performance with 10x fewer training steps

    Researchers from Kuaishou's Kwaipilot team have developed a novel reinforcement learning framework called SRPO, designed to improve the efficiency and performance of large language models. This new method addresses limi…