PulseAugur
实时 23:58:05

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

Researchers have developed a new framework called Inverse Tree Freezing to understand how large language models (LLMs) achieve complex reasoning. This model views the LLM's learning process as a random walk on a 'Concept Network' (CoNet), where reinforcement learning with verifiable rewards (RLVR) guides the model. The process involves merging compatible reasoning paths and resolving competition among incompatible ones, ultimately forming directed inverse trees. The study also introduces Annealed-RLVR, a timed intervention during the training process that improves performance on various benchmarks, especially when extensive reasoning is required. AI

影响 Introduces a novel theoretical framework for LLM reasoning and a training technique that improves performance on complex tasks.

排序理由 This is a research paper detailing a new theoretical framework and training method for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Sihan Hu, Xiansheng Cai, Yuan Huang, Zhiyuan Yao, Linfeng Zhang, Pan Zhang, Youjin Deng, Kun Chen ·

    Emergent Slow Thinking in LLMs as Inverse Tree Freezing

    arXiv:2509.23629v3 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) enables large language models to acquire slow, multi-step reasoning from sparse final-answer signals. We provide a statistical-physics picture of this emergence. We sho…