English(EN) A Verifiable Search Is Not a Learnable Chain-of-Thought

AI模型难以通过思维链微调学习回溯搜索

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-20 00:00

一篇新的研究论文探讨了通过思维链（CoT）微调来教授AI模型复杂推理任务的局限性。研究发现，虽然模型可以轻松学习前向可计算的任务，但它们在涉及回溯搜索的过程（如密码算术）方面遇到困难。即使经过广泛的微调和诸如RL等各种方法，模型也未能有效地模仿搜索过程，而是学会了模仿特定步骤而未能理解底层逻辑。研究表明，对于需要搜索的任务，预先计算解决方案并专注于记忆和验证比试图教授搜索过程本身更有效。 AI

影响突出了当前AI训练方法在复杂推理任务方面的一个根本性局限，表明需要超越简单模仿学习的新方法。

排序理由在arXiv上发表的研究论文，详细介绍了AI模型训练的局限性。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Harsh Patel · 2026-06-20 04:52

可验证搜索并非可学习的思维链

It is tempting to assume any task solvable by a short program can be taught to a model as its chain-of-thought: write the steps out, fine-tune, and the model follows. This paper shows the assumption fails for an identifiable class of procedures. The testbed is nine reasoning task…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-20 00:00

A Verifiable Search Is Not a Learnable Chain-of-Thought

Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration.

报道来源 [2]

可验证搜索并非可学习的思维链

A Verifiable Search Is Not a Learnable Chain-of-Thought

相关实体

相关话题