English(EN) Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning

新方法惩罚冗余，使大语言模型推理更高效

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

研究人员开发了一种新颖的方法，通过惩罚其思维链（CoT）追踪中的内部和外部冗余来减少大型推理模型（LRM）的“过度思考”。这种双重惩罚强化学习框架分别解决了第一个正确答案之前的信��停滞和之后的冗余延续问题。在GSM8K和MATH500等基准测试上的实验表明，推理长度显著缩短，在1.5B模型上最多可减少41.3%，同时保持了具有竞争力的准确性并提高了整体效率。该方法还显示出对GPQA和LiveCodeBench等域外任务的可迁移性，为构建更简洁、更具可解释性的LRM指明了方向。 AI

影响降低了推理成本，提高了大型推理模型的可解释性。

排序理由该集群包含一篇学术论文，详细介绍了一种提高大语言模型推理效率的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Taihang Zhen, Jialiang Hong, Kai Chen, Guang Yang, Junlan Feng, Wenpeng Zhu, Jing Huo, Yang Gao, Depeng Wang, Haitao Wan, Xi Yang, Fanyu Meng, Yuyao Zhang, Ji Qi, Xiangyu Zhou · 2026-06-30 04:00

Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning

arXiv:2508.02178v3 Announce Type: replace Abstract: Large reasoning models (LRMs) often exhibit overthinking, producing verbose Chain-of-Thought (CoT) traces that increase inference cost and obscure the underlying reasoning process. Existing CoT compression methods mainly rely on…

报道来源 [1]

Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning

相关实体

相关话题