English(EN) CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

CopT框架逆转LLM推理，提高准确性和效率

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-19 16:28

研究人员推出了一种新颖的大型语言模型推理框架CopT，该框架颠覆了传统的思考和回答顺序。CopT不先生成思考过程再给出答案，而是先引出一个草稿答案，然后利用策略内学习进行反思和修正。该方法采用连续嵌入作为对比验证器来评估答案的可靠性，在无需额外训练的情况下，在各种推理任务上的准确率提高了23%，令牌使用量减少了57%。 AI

影响这种新的推理方法通过优化思考和回答过程，有望带来更高效、更准确的LLM应用。

排序理由该集群包含一篇详细介绍LLM推理新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

large language models

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Wenke Lee · 2026-05-19 16:28

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs ev…

报道来源 [1]

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

相关实体

相关话题