新研究深入探讨大型语言模型（LLM）的推理、指令遵循和自我纠错能力

作者 PulseAugur 编辑部 · [8 个来源] · 2025-10-22 00:00

近期几篇研究论文探讨了大型推理模型（LRMs）的内部机制和推理能力。其中一篇已被撤回的论文提出了熵梯度反演（Entropy-Gradient Inversion）及相关优化技术（CorR-PO），旨在通过关联 token 熵与 logit 梯度来改进推理。另一篇被撤回的论文 LambdaPO，旨在通过重新构想优势估计以获得更细粒度的偏好信号，来增强强化学习的对齐。第三篇论文引入了凸组合能量最小化（Convex Compositional Energy Minimization, CCEM）来解决组合推理模型中的非凸性问题，使其能够迁移到更大的问题实例。最后，一项关于 LRMs 中“隐藏批判能力”的研究，识别出一种“批判向量”，可以在无需额外训练的情况下改进错误检测和自我纠错。 AI

影响新研究探索了改进 LLM 推理、指令遵循和自我纠错能力的方法，有望带来更可靠、更可控的 AI 系统。

排序理由多篇 arXiv 论文详细介绍了大型推理模型的新方法和分析。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 8 个来源。我们如何撰写摘要 →

报道来源 [8]

arXiv cs.AI TIER_1 English(EN) · Junyao Yang, Chen Qian, Kun Wang, Linfeng Zhang, Quanshi Zhang, Yong Liu, Dongrui Liu · 2026-05-25 04:00

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

arXiv:2605.17770v2 Announce Type: replace Abstract: The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systematic, step-by-step ``slow thinking'' reasoning, unlocking state-of-the-art performance in c…
arXiv cs.CL TIER_1 English(EN) · Zhe Yuan, Yipeng Zhou, Jinghan Li, Xinyuan Chen, Bowen Deng, Zhiqian Chen, Liang Zhao · 2026-05-25 04:00

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

arXiv:2605.19416v2 Announce Type: replace Abstract: Group Relative Policy Optimization(GRPO) has become a cornerstone of modern reinforcement learning alignment, prized for its efficacy in foregoing an explicit value-critic by leveraging reward normalization across sampled trajec…
arXiv cs.LG TIER_1 English(EN) · Meir Roketlishvili, Semyon Semenov, Maksim Bobrin, Viktor Kovalchuk, Albert Baichorov, Abduragim Shtanchaev, Fakhri Karray, Dmitry V. Dylov, Martin Tak\'a\v{c}, Arip Asadulaev · 2026-05-25 04:00

Convex Compositional Reasoning Models

arXiv:2605.23395v1 Announce Type: new Abstract: Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is …
arXiv cs.LG TIER_1 English(EN) · Hoang Phan, Quang H. Nguyen, Hung T. Q. Le, Xiusi Chen, Heng Ji, Khoa D. Doan · 2026-05-25 04:00

Decoding the Critique Mechanism in Large Reasoning Models

arXiv:2603.16331v2 Announce Type: replace Abstract: Large Reasoning Models (LRMs) exhibit backtracking and self-verification mechanisms that enable them to revise intermediate steps and reach correct solutions, yielding strong performance on complex logical benchmarks. We hypothe…
arXiv cs.LG TIER_1 English(EN) · Arip Asadulaev · 2026-05-22 09:04

Convex Compositional Reasoning Models

Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is not composition itself, but the non-convex geome…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 00:00

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

Equilibrium Reasoners enable scalable reasoning through task-conditioned attractors that guide latent dynamical systems toward valid solutions, achieving significant accuracy improvements through iterative test-time computation.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 06:10

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Group Relative Policy Optimization(GRPO) has become a cornerstone of modern reinforcement learning alignment, prized for its efficacy in foregoing an explicit value-critic by leveraging reward normalization across sampled trajectory cohorts. However, the method's reliance on a mo…
Together AI blog TIER_1 English(EN) · 2025-10-22 00:00

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.

报道来源 [8]

相关实体

相关话题