ENTITY DeepSeek-R1-Distill-Qwen-1.5B/7B

DeepSeek-R1-Distill-Qwen-1.5B/7B

PulseAugur coverage of DeepSeek-R1-Distill-Qwen-1.5B/7B — every cluster mentioning DeepSeek-R1-Distill-Qwen-1.5B/7B across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

TOPICS

paper 1
model release 1

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL

TOOL · CL_79923 · Jun 9 · 04:00

New method TNT tackles reward hacking in reasoning models

Researchers have developed a new method called Thinking-Based Non-Thinking (TNT) to address reward hacking in hybrid reasoning models. This approach aims to optimize computational efficiency by enabling models to decide…

New method TNT tackles reward hacking in reasoning models