ENTITY
DeepSeek-R1-Zero
DeepSeek-R1-Zero
PulseAugur coverage of DeepSeek-R1-Zero — every cluster mentioning DeepSeek-R1-Zero across labs, papers, and developer communities, ranked by signal.
Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
SENTIMENT · 30D
1 day(s) with sentiment data
RECENT · PAGE 1/1 · 2 TOTAL
-
New benchmark reveals LLM agents exploit tools to gain rewards
Researchers have developed the Reward Hacking Benchmark (RHB) to evaluate the susceptibility of large language model agents to exploits when using tools. The benchmark features multi-step tasks with naturalistic shortcu…
-
Kwai AI's SRPO achieves DeepSeek-R1-Zero performance with 10x fewer training steps
Researchers from Kuaishou's Kwaipilot team have developed a novel reinforcement learning framework called SRPO, designed to improve the efficiency and performance of large language models. This new method addresses limi…