English(EN) Learning to reason with LLMs

OpenAI 的 o1 模型展现出高级推理能力，而谷歌和苹果则在探索新的 LLM 训练方法。

作者 PulseAugur 编辑部 · [50 个来源] · 2024-09-12 10:02

OpenAI 发布了其新模型 OpenAI o1-preview 的早期版本，该模型在推理能力方面相比 GPT-4o 有显著提升。该模型在竞赛编程、高级数学考试和复杂的科学基准测试中表现出色，在某些领域超越了人类专家的表现。这种进步归功于一种大规模强化学习算法，该算法通过思维链教会模型进行生产性思考，并且性能随着训练和测试时间的计算量而扩展。 AI

影响这一新模型为推理能力设定了更高的标准，有可能加速在各个领域开发更复杂的 AI 代理和工具。

排序理由 OpenAI 宣布推出一款名为 OpenAI o1-preview 的新模型，该模型在推理方面有显著改进，并已通过 ChatGPT 和 API 发布供使用。

在 OpenAI News 阅读 →

AI 生成摘要 · Google Gemini · 来自 50 个来源。我们如何撰写摘要 →

OpenAI 的 o1 模型展现出高级推理能力，而谷歌和苹果则在探索新的 LLM 训练方法。

报道来源 [50]

Google AI / Research TIER_1 English(EN) · 2026-03-04 20:29

Teaching LLMs to reason like Bayesians

Generative AI
OpenAI News TIER_1 English(EN) · 2024-09-12 10:02

Learning to reason with LLMs
Apple Machine Learning Research TIER_1 English(EN) · 2026-04-28 00:00

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning

Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought (CoT) generation. However, LLM’s autoregressive decoding may limit the ability to revisit and refine earlier tokens in a holistic manner, which can also lead to inefficient exploration for d…
Hugging Face Blog TIER_1 English(EN) · 2025-09-10 00:00

Jupyter Agents: training LLMs to reason with notebooks
arXiv cs.CL TIER_1 English(EN) · Leon Hamm, Zlatan Ajanovic · 2026-05-08 04:00

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

arXiv:2605.06040v1 Announce Type: cross Abstract: Although advances such as chain-of-thought, tree-of-thought or reinforcement learning have improved the performance of LLMs in reasoning and planning tasks, they are still brittle and have not achieved human-level performance in m…
arXiv cs.AI TIER_1 English(EN) · Jiahui Zhou, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Lin Li, Zhuomin Chen, Jian Lou, See-Kiong Ng · 2026-05-08 04:00

Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

arXiv:2602.07830v2 Announce Type: replace Abstract: Time series is a pervasive data type across various application domains, rendering the reasonable solving of diverse time series tasks a long-standing goal. Recent advances in large language models (LLMs), especially their reaso…
arXiv cs.CL TIER_1 English(EN) · \"Omer Faruk Akg\"ul, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna · 2026-05-08 04:00

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

arXiv:2605.06241v1 Announce Type: new Abstract: Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base mod…
arXiv cs.CL TIER_1 English(EN) · Viktor Prasanna · 2026-05-07 13:25

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains. In this work, we ask: if RL…
arXiv cs.CL TIER_1 English(EN) · Zlatan Ajanovic · 2026-05-07 11:28

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

Although advances such as chain-of-thought, tree-of-thought or reinforcement learning have improved the performance of LLMs in reasoning and planning tasks, they are still brittle and have not achieved human-level performance in many domains, and often suffer from high time and t…
arXiv cs.LG TIER_1 English(EN) · Yiming Huang, Zhenbo Shi, Xin-Cheng Wen, Jichuan Zeng, Cuiyun Gao, Peiyi Han, Chuanyi Liu · 2026-05-07 04:00

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

arXiv:2605.04065v1 Announce Type: cross Abstract: Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the mo…
arXiv cs.LG TIER_1 English(EN) · Yiming Huang, Zhenbo Shi, Shuzheng Gao, Cuiyun Gao, Peiyi Han, Chuanyi Liu · 2026-05-07 04:00

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

arXiv:2605.04066v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization scheme…
arXiv cs.AI TIER_1 English(EN) · Ruiqing Zhao, Fengzhi Li, Yuan Zuo, Rui Liu, Yansong Liu, Yunfei Ma, Fanyu Meng, Junlan Feng · 2026-05-06 04:00

Strategy-Aware Optimization Modeling with Reasoning LLMs

arXiv:2605.02545v1 Announce Type: new Abstract: Large language models (LLMs) can generate syntactically valid optimization programs, yet often struggle to reliably choose an effective modeling strategy, leading to incorrect formulations and inefficient solver behavior. We propose…
arXiv cs.CL TIER_1 English(EN) · Arash Ahmadi (Mike), Sarah Sharif (Mike), Yaser (Mike), Banad · 2026-05-05 04:00

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

arXiv:2605.02073v1 Announce Type: new Abstract: Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive t…
arXiv cs.AI TIER_1 English(EN) · Junlan Feng · 2026-05-04 12:48

Strategy-Aware Optimization Modeling with Reasoning LLMs

Large language models (LLMs) can generate syntactically valid optimization programs, yet often struggle to reliably choose an effective modeling strategy, leading to incorrect formulations and inefficient solver behavior. We propose SAGE, a strategy-aware framework that makes Mod…
arXiv cs.LG TIER_1 English(EN) · Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin · 2026-05-04 04:00

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

arXiv:2605.00380v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Ne…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-03 22:01

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives …
arXiv cs.CL TIER_1 English(EN) · Banad · 2026-05-03 22:01

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives …
arXiv cs.AI TIER_1 English(EN) · Wilder Baldwin, Sepideh Ghanavati · 2026-05-01 04:00

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

arXiv:2604.27713v1 Announce Type: new Abstract: The risks posed by AI features are increasing as they are rapidly integrated into software applications. In response, regulations and standards for safe and secure AI have been proposed. In this paper, we present an agentic framewor…
arXiv cs.CL TIER_1 English(EN) · Garvin Kruthof · 2026-05-01 04:00

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

arXiv:2604.28031v1 Announce Type: new Abstract: When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted sci…
arXiv cs.AI TIER_1 English(EN) · Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li · 2026-05-01 04:00

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

arXiv:2604.28056v1 Announce Type: new Abstract: Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focused primarily on generating, evol…
arXiv cs.CL TIER_1 English(EN) · Byeongjin Kim, Gyuwan Kim, Seo Yeon Park · 2026-05-01 04:00

PPA-Plan: Proactive Pitfall Avoidance for Reliable Planning in Long-Context LLM Reasoning

arXiv:2601.11908v2 Announce Type: replace Abstract: Large language models (LLMs) struggle with reasoning over long contexts where relevant information is sparsely distributed. Although plan-and-execute frameworks mitigate this by decomposing tasks into planning and execution, the…
arXiv cs.CL TIER_1 English(EN) · Guojun Yin · 2026-05-01 03:57

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this …
arXiv cs.AI TIER_1 English(EN) · Hui Li · 2026-04-30 16:01

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focused primarily on generating, evolving, or selecting reward candidates, while payi…
arXiv cs.CL TIER_1 English(EN) · Garvin Kruthof · 2026-04-30 15:46

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-30 15:46

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark …
arXiv cs.AI TIER_1 English(EN) · Sepideh Ghanavati · 2026-04-30 10:57

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

The risks posed by AI features are increasing as they are rapidly integrated into software applications. In response, regulations and standards for safe and secure AI have been proposed. In this paper, we present an agentic framework that constructs knowledge graphs (KGs) from AI…
arXiv cs.CL TIER_1 English(EN) · Zhenyu Zhao, Sander Land, Dan Bikel, Waseem Alshikh · 2026-04-30 04:00

Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

arXiv:2604.26355v1 Announce Type: new Abstract: Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low…
arXiv cs.CL TIER_1 English(EN) · Waseem Alshikh · 2026-04-29 07:06

Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low-entropy \textit{structural} tokens (recurring p…
arXiv cs.LG TIER_1 English(EN) · Bojie Li · 2026-04-29 04:00

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

arXiv:2604.24827v1 Announce Type: new Abstract: Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exp…
arXiv cs.CL TIER_1 English(EN) · James Pustejovsky, Nikhil Krishnaswamy · 2026-04-29 04:00

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

arXiv:2604.25136v1 Announce Type: new Abstract: We propose Frictive Policy Optimization (FPO), a framework for learning language model policies that regulate not only what to say, but when and how to intervene in order to manage epistemic and normative risk. Unlike standard align…
arXiv cs.CL TIER_1 English(EN) · Xiang Liu, Xuming Hu, Xiaowen Chu, Eunsol Choi · 2026-04-29 04:00

DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference

arXiv:2510.19669v4 Announce Type: replace Abstract: Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach h…
arXiv cs.CL TIER_1 English(EN) · Liaoyaqi Wang, Chunsheng Zuo, William Jurayj, Benjamin Van Durme, Anqi Liu · 2026-04-28 04:00

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

arXiv:2604.23333v1 Announce Type: cross Abstract: Scaling test-time computation with reinforcement learning (RL) has emerged as a reliable path to improve large language models (LLM) reasoning ability. Yet, outcome-based reward often incentivizes models to be overconfident, leadi…
arXiv cs.CL TIER_1 English(EN) · Tomer Ashuach, Shai Gretz, Yoav Katz, Yonatan Belinkov, Liat Ein-Dor · 2026-04-28 04:00

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness

arXiv:2604.12373v5 Announce Type: replace Abstract: Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether large language models possess similar privileged knowledge about answer correctn…
arXiv cs.CL TIER_1 English(EN) · Alexis Limozin, Eduard Durech, Torsten Hoefler, Imanol Schlag, Valentina Pyatkin · 2026-04-28 04:00

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning

arXiv:2604.23747v1 Announce Type: cross Abstract: Recent mixed-policy optimization methods for LLM reasoning that interleave or blend supervised and reinforcement learning signals report improvements over the standard SFT-then-RL pipeline. We show that numerous recently published…
arXiv cs.AI TIER_1 English(EN) · Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Qiang Lin, Hande Dong, Jiawei Chen · 2026-04-28 04:00

Scheduling Your LLM Reinforcement Learning with Reasoning Trees

arXiv:2510.24832v2 Announce Type: replace Abstract: Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be conceptualized as progressively editing a query's "Reasoning Tree'. This process involves exploring nodes (tokens) and d…
arXiv cs.AI TIER_1 English(EN) · Kaiyang Wan, Lang Gao, Honglin Mu, Preslav Nakov, Yuxia Wang, Xiuying Chen · 2026-04-28 04:00

A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA

arXiv:2509.21199v3 Announce Type: replace Abstract: Multi-Hop Question Answering (MHQA) requires integrating dispersed, interdependent evidence through sequential reasoning under noise. This task is challenging for LLMs as they have a finite per-pass output capacity, beyond which…
arXiv cs.AI TIER_1 English(EN) · Shuxu Chen, Yitian Zhou, Jiaquan Zhang, Haoyu Bian, Aming Wu, Sungyoung Lee, Chaoning Zhang, Hyundong Shin · 2026-04-28 04:00

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

arXiv:2604.23270v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on long, multi-step problems, leading …
arXiv cs.AI TIER_1 English(EN) · Junyan Cheng, Kyle Richardson, Peter Chin · 2026-04-28 04:00

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

arXiv:2604.23072v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instability and lacks a verifiable, compo…
arXiv cs.CL TIER_1 English(EN) · Qibin Wang, Pu Zhao, Shaohan Huang, Fangkai Yang, Lu Wang, Furu Wei, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang · 2026-04-28 04:00

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

arXiv:2509.00084v2 Announce Type: replace-cross Abstract: Test-time scaling (TTS) has gained widespread attention for enhancing LLM reasoning. Existing approaches such as Best-of-N and majority voting are limited as their performance depends on the quality of candidate responses,…
arXiv cs.CL TIER_1 English(EN) · Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu · 2026-04-28 04:00

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

arXiv:2508.01191v5 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting has been shown to be effective in eliciting structured reasoning (i.e., CoT reasoning) from large language models (LLMs). Regardless of its popularity, recent studies expose its failures in…
arXiv cs.CL TIER_1 English(EN) · Nikhil Krishnaswamy · 2026-04-28 02:24

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

We propose Frictive Policy Optimization (FPO), a framework for learning language model policies that regulate not only what to say, but when and how to intervene in order to manage epistemic and normative risk. Unlike standard alignment methods that optimize surface-level prefere…
arXiv cs.LG TIER_1 English(EN) · Md Muntaqim Meherab, Noor Islam S. Mohammad, Faiza Feroz · 2026-04-27 04:00

Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

arXiv:2603.10377v2 Announce Type: replace Abstract: Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over sparse, interpretable latent f…
arXiv cs.LG TIER_1 English(EN) · Yigit Ihlamur · 2026-04-23 12:05

CoFEE: Reasoning Control for LLM-Based Feature Discovery

Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and post-outcome signals. With the introduction of ever-improving Large Language Mo…
arXiv cs.CL TIER_1 English(EN) · Nicholas Kluge Corrêa · 2026-04-23 09:13

Reasoning Primitives in Hybrid and Non-Hybrid LLMs

Reasoning in large language models is often treated as a monolithic capability, but its observed gains may arise from more basic operations. We study reasoning through two such primitives, recall and state-tracking, and ask whether hybrid architectures that combine attention-base…
Ahead of AI (Sebastian Raschka) TIER_1 English(EN) · Sebastian Raschka, PhD · 2026-01-24 11:23

Categories of Inference-Time Scaling for Improved LLM Reasoning

And an Overview of Recent Inference-Scaling Papers
Ahead of AI (Sebastian Raschka) TIER_1 English(EN) · Sebastian Raschka, PhD · 2025-04-19 11:02

The State of Reinforcement Learning for LLM Reasoning

Understanding GRPO and New Insights from Reasoning Model Papers
Ahead of AI (Sebastian Raschka) TIER_1 English(EN) · Sebastian Raschka, PhD · 2025-03-08 12:11

The State of LLM Reasoning Model Inference

Inference-Time Compute Scaling Methods to Improve Reasoning Models
Ahead of AI (Sebastian Raschka) TIER_1 English(EN) · Sebastian Raschka, PhD · 2025-02-05 12:11

Understanding Reasoning LLMs

Methods and Strategies for Building and Refining Reasoning Models
arXiv stat.ML TIER_1 English(EN) · Patrick Rebeschini · 2026-04-20 15:38

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

Large language models (LLMs) using chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or aft…
Smol AINews TIER_1 English(EN) · 2025-07-08 05:44

SmolLM3: the SOTA 3B reasoning open source LLM

**HuggingFace** released **SmolLM3-3B**, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until **Olmo 3** arrives. **Grok 4** was launched with mixed reactions, while concerns about **Claude 4** nerfs and a…

报道来源 [50]

相关实体

相关话题