New research probes LLM reasoning, instruction following, and self-correction

By PulseAugur Editorial · [25 sources] · 2025-10-22 00:00

Several recent research papers explore the internal mechanisms and reasoning capabilities of Large Reasoning Models (LRMs). One paper, since withdrawn, proposed Entropy-Gradient Inversion and a related optimization technique (CorR-PO) to correlate token entropy with logit gradients for improved reasoning. Another withdrawn paper, LambdaPO, aimed to enhance reinforcement learning alignment by re-conceptualizing advantage estimation for finer-grained preference signals. A third paper introduced Convex Compositional Energy Minimization (CCEM) to address non-convexity in compositional reasoning models, enabling transfer to larger problem instances. Finally, a study on the "hidden critique ability" in LRMs identified a "critique vector" that can improve error detection and self-correction without additional training. AI

IMPACT New research explores methods to improve LLM reasoning, instruction following, and self-correction capabilities, potentially leading to more reliable and controllable AI systems.

RANK_REASON Multiple arXiv papers detailing new methods and analyses for large reasoning models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 25 sources. How we write summaries →

COVERAGE [25]

arXiv cs.AI TIER_1 English(EN) · Szymon Bobek, {\L}ukasz Ba{\l}ec, Grzegorz J. Nalepa · 2026-05-26 04:00

Actionable and diverse counterfactual explanations incorporating domain knowledge and plausibility constraints

arXiv:2511.20236v3 Announce Type: replace Abstract: Counterfactual explanations improve the actionable interpretability of machine learning models by identifying minimal changes required to achieve a desired outcome. However, existing methods often neglect dependencies among feat…
arXiv cs.LG TIER_1 English(EN) · Wenbo Pan, Zhichao Liu, Xianlong Wang, Haining Yu, Xiaohua Jia · 2026-05-26 04:00

Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs

arXiv:2602.01914v2 Announce Type: replace Abstract: Token attribution methods provide intuitive explanations for language model outputs by identifying causally important input tokens. However, as modern LLMs increasingly rely on extended reasoning chains, existing schemes face tw…
arXiv cs.CL TIER_1 English(EN) · Yuming Yang, Mingyoung Lai, Wanxu Zhao, Xiaoran Fan, Zhiheng Xi, Mingqi Wu, Chiyue Huang, Jun Zhao, Haijun Lv, Jian Tong, Yunhua Zhou, Yicheng Zou, Qipeng Guo, Tao Gui, Qi Zhang, Xuanjing Huang · 2026-05-26 04:00

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

arXiv:2601.14249v5 Announce Type: replace Abstract: Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not n…
arXiv cs.CL TIER_1 English(EN) · Lisa Alazraki, Lihu Chen, Ana Brassard, Joe Stacey, Hossein A. Rahmani, Marek Rei · 2026-05-26 04:00

AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios

arXiv:2508.19988v3 Announce Type: replace Abstract: Large Language Models (LLMs) have achieved high accuracy on complex commonsense and mathematical problems that involve the composition of multiple reasoning steps. However, current compositional benchmarks testing these skills t…
arXiv cs.CL TIER_1 English(EN) · Hui Xie, Jie Liu, Ziyue Qiao, Joaquin Vanschore · 2026-05-26 04:00

Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains

arXiv:2605.25745v1 Announce Type: new Abstract: Explicit chain-of-thought (CoT) reasoning substantially improves the reasoning ability of large language models (LLMs), but incurs high inference cost due to lengthy autoregressive traces. Existing latent reasoning methods offer a p…
arXiv cs.CL TIER_1 English(EN) · Zongji Yu, Wenshui Luo, Yiliu Sun, Hao Fang, Runmin Cong, Chaochao Lu, Chen Gong · 2026-05-26 04:00

Harmony in Diversity: Multi-domain Contrastive Policy Optimization for Large Reasoning Models

arXiv:2605.25443v1 Announce Type: new Abstract: Post-training has significantly enhanced the reasoning capability of Large Reasoning Models (LRMs), especially with Reinforcement Learning (RL) like Group Relative Policy Optimization (GRPO). However, GRPO-style RL methods in multi-…
arXiv cs.CL TIER_1 Norsk(NO) · Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Leszek Rutkowski, Dacheng Tao · 2026-05-26 04:00

Better, Faster: Harnessing Self-Improvement in Large Reasoning Models

arXiv:2605.24998v1 Announce Type: new Abstract: Self-improvement training enables the large reasoning models (LRMs) to improve themselves by self-generating reasoning trajectories as training data without external supervision. However, we find that this method often falls short i…
arXiv cs.AI TIER_1 English(EN) · Serafim Batzoglou · 2026-05-26 04:00

INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

arXiv:2602.18956v3 Announce Type: replace Abstract: We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with extensionally labeled target predicates, models must output a single first order logical …
arXiv cs.AI TIER_1 English(EN) · Andreas Opedal, Francesco Ignazio Re, Abulhair Saparov, Mrinmaya Sachan, Bernhard Sch\"olkopf, Ryan Cotterell · 2026-05-26 04:00

Learning to Reason Efficiently with A* Post-Training

arXiv:2605.24597v1 Announce Type: new Abstract: Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inference steps. We frame natural language inference as a search problem where the final answer is t…
arXiv cs.AI TIER_1 English(EN) · Andrew Corbett, Archit Sood, Anna Tzatzopoulou, Sai-Aakash Ramesh, Tim Dodwell · 2026-05-26 04:00

Boosting Inference with Guided Reasoning: Stochastic Exploration for Recursive Models

arXiv:2605.25230v1 Announce Type: new Abstract: Recent work on recursive architectures has shown that tiny neural networks can be surprisingly powerful on structured reasoning tasks. The trick is to model reasoning trajectories with a latent dynamical system. We argue that the in…
arXiv cs.AI TIER_1 English(EN) · Hongbo Jin, Mingnan Zhu, Jingqi Tian, Xu Jiang, Zhongjing Du, Haoran Tang, Siyi Xie, Qiaoman Zhang, Jiayu Ding · 2026-05-26 04:00

Context-CoT: Enhancing Context Learning via High-Quality Reasoning Synthesis

arXiv:2605.25354v1 Announce Type: new Abstract: While LLMs excel at reasoning over prompts using static pretrained knowledge, they struggle significantly with context learning-the ability to dynamically extract, internalize, and apply new knowledge from complex, task-specific con…
arXiv cs.AI TIER_1 English(EN) · Qirun Dai, Xiao Liu, Jiawei Zhang, Dylan Zhang, Hao Peng, Chenhao Tan · 2026-05-26 04:00

Towards a Universal Causal Reasoner

arXiv:2605.24873v1 Announce Type: cross Abstract: Despite the importance of causal reasoning, training LLMs to reason causally remains underexplored. Existing data efforts mostly focus on benchmarking LLMs on specific aspects of causality, making them less suitable for training g…
arXiv cs.AI TIER_1 English(EN) · Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik Beltr\'an, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, And… · 2026-05-26 04:00

Teaching large language models to reason like expert diagnosticians

arXiv:2509.12194v2 Announce Type: replace Abstract: Differential diagnosis is an iterative process that integrates patient information with broader medical knowledge. Clinical case series such as the NEJM Clinicopathologic Conferences (CPCs), published continuously since 1923, fe…
arXiv cs.AI TIER_1 English(EN) · Mingyu Zhang, Lifeng Zhuo, Tianxi Tan, Guocan Xie, Xian Nie, Yan Li, Renjie Zhao, Zizhu He, Ziyu Wang, Jiting Cai, Yong-Lu Li · 2026-05-26 04:00

IPR-1: Interactive Physical Reasoner

arXiv:2511.15407v4 Announce Type: replace Abstract: Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more exp…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 11:57

Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains

Explicit chain-of-thought (CoT) reasoning substantially improves the reasoning ability of large language models (LLMs), but incurs high inference cost due to lengthy autoregressive traces. Existing latent reasoning methods offer a promising alternative, yet they often treat reaso…
arXiv cs.AI TIER_1 English(EN) · Junyao Yang, Chen Qian, Kun Wang, Linfeng Zhang, Quanshi Zhang, Yong Liu, Dongrui Liu · 2026-05-25 04:00

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

arXiv:2605.17770v2 Announce Type: replace Abstract: The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systematic, step-by-step ``slow thinking'' reasoning, unlocking state-of-the-art performance in c…
arXiv cs.LG TIER_1 English(EN) · Hoang Phan, Quang H. Nguyen, Hung T. Q. Le, Xiusi Chen, Heng Ji, Khoa D. Doan · 2026-05-25 04:00

Decoding the Critique Mechanism in Large Reasoning Models

arXiv:2603.16331v2 Announce Type: replace Abstract: Large Reasoning Models (LRMs) exhibit backtracking and self-verification mechanisms that enable them to revise intermediate steps and reach correct solutions, yielding strong performance on complex logical benchmarks. We hypothe…
arXiv cs.LG TIER_1 English(EN) · Meir Roketlishvili, Semyon Semenov, Maksim Bobrin, Viktor Kovalchuk, Albert Baichorov, Abduragim Shtanchaev, Fakhri Karray, Dmitry V. Dylov, Martin Tak\'a\v{c}, Arip Asadulaev · 2026-05-25 04:00

Convex Compositional Reasoning Models

arXiv:2605.23395v1 Announce Type: new Abstract: Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is …
arXiv cs.CL TIER_1 English(EN) · Zhe Yuan, Yipeng Zhou, Jinghan Li, Xinyuan Chen, Bowen Deng, Zhiqian Chen, Liang Zhao · 2026-05-25 04:00

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

arXiv:2605.19416v2 Announce Type: replace Abstract: Group Relative Policy Optimization(GRPO) has become a cornerstone of modern reinforcement learning alignment, prized for its efficacy in foregoing an explicit value-critic by leveraging reward normalization across sampled trajec…
arXiv cs.LG TIER_1 English(EN) · Arip Asadulaev · 2026-05-22 09:04

Convex Compositional Reasoning Models

Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is not composition itself, but the non-convex geome…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-22 00:00

Decoding the Critique Mechanism in Large Reasoning Models

Large Reasoning Models demonstrate hidden critique abilities that allow error recovery through internal mechanisms, identified via interpretable critique vectors that enhance error detection without additional training.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 00:00

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

Equilibrium Reasoners enable scalable reasoning through task-conditioned attractors that guide latent dynamical systems toward valid solutions, achieving significant accuracy improvements through iterative test-time computation.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 06:10

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Group Relative Policy Optimization(GRPO) has become a cornerstone of modern reinforcement learning alignment, prized for its efficacy in foregoing an explicit value-critic by leveraging reward normalization across sampled trajectory cohorts. However, the method's reliance on a mo…
arXiv cs.CV TIER_1 English(EN) · Fanhu Zeng, Zhicong Luo, Zefan Wang, You Li, Chi Chen, Maosong Sun · 2026-05-26 04:00

Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning

arXiv:2605.25437v1 Announce Type: new Abstract: Visual reasoning through reinforcement learning with verifiable rewards (RLVR) has achieved remarkable progress. However, when dealing with multi-source inputs, existing approaches tend to treat them as a mere accumulation of inform…
Together AI blog TIER_1 English(EN) · 2025-10-22 00:00

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.

COVERAGE [25]

RELATED ENTITIES

RELATED TOPICS