English(EN) Emergent Slow Thinking in LLMs as Inverse Tree Freezing

LLM 推理通过逆向树冻结涌现，提升多步思考能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 04:00

研究人员开发了一个名为“逆向树冻结”（Inverse Tree Freezing）的新框架，用于理解大型语言模型（LLM）如何实现复杂推理。该模型将 LLM 的学习过程视为在“概念网络”（Concept Network, CoNet）上的随机游走，并通过可验证奖励的强化学习（RLVR）进行引导。该过程包括合并兼容的推理路径并解决不兼容路径之间的竞争，最终形成定向逆向树。研究还引入了“退火 RLVR”（Annealed-RLVR），一种在训练过程中进行的定时干预，可提高在各种基准测试上的性能，尤其是在需要大量推理时。 AI

影响引入了一个新颖的 LLM 推理理论框架和一种提高复杂任务性能的训练技术。

排序理由这是一篇详细介绍 LLM 新理论框架和训练方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Sihan Hu, Xiansheng Cai, Yuan Huang, Zhiyuan Yao, Linfeng Zhang, Pan Zhang, Youjin Deng, Kun Chen · 2026-05-08 04:00

Emergent Slow Thinking in LLMs as Inverse Tree Freezing

arXiv:2509.23629v3 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) enables large language models to acquire slow, multi-step reasoning from sparse final-answer signals. We provide a statistical-physics picture of this emergence. We sho…

报道来源 [1]

Emergent Slow Thinking in LLMs as Inverse Tree Freezing

相关实体

相关话题