CopT framework reverses LLM reasoning, boosting accuracy and efficiency

By PulseAugur Editorial · [1 sources] · 2026-05-19 16:28

Researchers have introduced CopT, a novel reasoning framework for large language models that reverses the traditional order of thinking and answering. Instead of generating a thought process before providing an answer, CopT first elicits a draft answer and then uses on-policy thinking to reflect and correct it. This method employs continuous embeddings as contrastive verifiers to assess answer reliability, improving accuracy by up to 23% and reducing token usage by up to 57% across various reasoning tasks without requiring additional training. AI

IMPACT This new reasoning approach could lead to more efficient and accurate LLM applications by optimizing the thinking and answering process.

RANK_REASON The cluster contains a new academic paper detailing a novel method for LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

large language models

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Wenke Lee · 2026-05-19 16:28

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs ev…

COVERAGE [1]

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

RELATED ENTITIES

RELATED TOPICS