PulseAugur
EN
LIVE 11:16:52

New methods boost LLM reasoning efficiency with compressed CoT

Researchers have developed new methods to improve the efficiency of chain-of-thought (CoT) reasoning in large language models. HybridThinker introduces a training scheme that balances retaining detailed thought steps with compressing them into memory tokens, achieving state-of-the-art accuracy with similar inference times. HMPO offers a cost-effective, single-stage reinforcement learning framework that adaptively compresses CoT, demonstrating significant token reduction across various tasks and model sizes with negligible accuracy loss. Another study explores the memory regimes of CoT and looped Transformers, highlighting how compressed loops are limited by their recurrent state size, unlike full sequence-state loops or CoT scratchpads. AI

IMPACT These advancements in CoT compression and memory management could lead to more capable and efficient LLMs for complex reasoning tasks.

RANK_REASON Multiple research papers introducing novel techniques for improving LLM reasoning efficiency.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

New methods boost LLM reasoning efficiency with compressed CoT

COVERAGE [5]

  1. arXiv cs.CL TIER_1 English(EN) · Xin Liu, Runsong Zhao, Xinyu Liu, Junhao Ruan, Pengcheng Huang, Shichao Dong, Chunyang Xiao, Chenglong Wang, Changliang Li, Jingbo Zhu, Tong Xiao ·

    HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps

    arXiv:2606.03768v1 Announce Type: new Abstract: Extended chain-of-thought (CoT) traces improve LLM reasoning but incur substantial computational and memory costs. While existing CoT compression methods mitigate this by condensing thought steps into compact representations via mem…

  2. arXiv cs.CL TIER_1 English(EN) · Tong Xiao ·

    HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps

    Extended chain-of-thought (CoT) traces improve LLM reasoning but incur substantial computational and memory costs. While existing CoT compression methods mitigate this by condensing thought steps into compact representations via memory tokens and retaining only these representati…

  3. arXiv cs.CL TIER_1 English(EN) · Minghui Zheng, Hongxu Chen, Huimin Ren, Hongsheng Xin, Xiaoyang Qu, Ze Wang, Shuling Yang, Ziyu Peng, Kaike Zhang, Pan Zhou, Kun Zhan ·

    HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression

    arXiv:2606.01934v1 Announce Type: cross Abstract: Large language models achieve remarkable performance via extended chain-of-thought (CoT) reasoning, yet this lengthy process incurs substantial inference overhead. Existing CoT compression methods struggle with inflexible manual l…

  4. arXiv cs.CL TIER_1 English(EN) · Kun Zhan ·

    HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression

    Large language models achieve remarkable performance via extended chain-of-thought (CoT) reasoning, yet this lengthy process incurs substantial inference overhead. Existing CoT compression methods struggle with inflexible manual length budgets, computationally expensive multi-sta…

  5. arXiv cs.LG TIER_1 English(EN) · Haozhou Zhang ·

    Chain-of-Thought and Compressed Looped Transformers: A Memory-Budget Separation

    arXiv:2605.30757v1 Announce Type: new Abstract: Chain-of-thought prompting and looped Transformers both give a fixed model more test-time computation, but they differ in what they remember. Chain-of-thought stores intermediate state in generated tokens that remain in the context,…