CopT framework reverses LLM reasoning, boosting accuracy and efficiency

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced CopT, a novel reasoning framework for large language models that reverses the traditional order of thinking and answering. Instead of generating a thought process before providing an answer, CopT first elicits a draft answer and then uses on-policy thinking to reflect and correct it. This method employs continuous embeddings as contrastive verifiers to assess answer reliability, improving accuracy by up to 23% and reducing token usage by up to 57% across various reasoning tasks without requiring additional training. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This new reasoning approach could lead to more efficient and accurate LLM applications by optimizing the thinking and answering process.

RANK_REASON The cluster contains a new academic paper detailing a novel method for LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

large language models

COVERAGE [1]

arXiv cs.AI TIER_1 · Wenke Lee · 2026-05-19 16:28

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs ev…

COVERAGE [1]

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

RELATED ENTITIES

RELATED TOPICS