OpenAI's o1 model shows advanced reasoning, while Google and Apple explore new LLM training methods.

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 50 sources

OpenAI has released an early version of its new model, OpenAI o1-preview, which demonstrates significant improvements in reasoning capabilities compared to GPT-4o. The model excels in competitive programming, advanced math exams, and complex scientific benchmarks, surpassing human expert performance in some areas. This advancement is attributed to a large-scale reinforcement learning algorithm that teaches the model to think productively using a chain of thought, with performance scaling with both training and test-time compute. AI

Summary written by gemini-2.5-flash-lite from 50 sources. How we write summaries →

IMPACT This new model sets a higher bar for reasoning capabilities, potentially accelerating the development of more sophisticated AI agents and tools across various domains.

RANK_REASON OpenAI announced a new model, OpenAI o1-preview, with significant reasoning improvements and released it for use in ChatGPT and via API.

Read on OpenAI News →

OpenAI's o1 model shows advanced reasoning, while Google and Apple explore new LLM training methods.

COVERAGE [50]

Google AI / Research TIER_1 · 2026-03-04 20:29

Teaching LLMs to reason like Bayesians

Generative AI
OpenAI News TIER_1 · 2024-09-12 10:02

Learning to reason with LLMs
Apple Machine Learning Research TIER_1 · 2026-04-28 00:00

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning

Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought (CoT) generation. However, LLM’s autoregressive decoding may limit the ability to revisit and refine earlier tokens in a holistic manner, which can also lead to inefficient exploration for d…
Hugging Face Blog TIER_1 · 2025-09-10 00:00

Jupyter Agents: training LLMs to reason with notebooks
arXiv cs.CL TIER_1 · Leon Hamm, Zlatan Ajanovic · 2026-05-08 04:00

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

arXiv:2605.06040v1 Announce Type: cross Abstract: Although advances such as chain-of-thought, tree-of-thought or reinforcement learning have improved the performance of LLMs in reasoning and planning tasks, they are still brittle and have not achieved human-level performance in m…
arXiv cs.AI TIER_1 · Jiahui Zhou, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Lin Li, Zhuomin Chen, Jian Lou, See-Kiong Ng · 2026-05-08 04:00

Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

arXiv:2602.07830v2 Announce Type: replace Abstract: Time series is a pervasive data type across various application domains, rendering the reasonable solving of diverse time series tasks a long-standing goal. Recent advances in large language models (LLMs), especially their reaso…
arXiv cs.CL TIER_1 · \"Omer Faruk Akg\"ul, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna · 2026-05-08 04:00

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

arXiv:2605.06241v1 Announce Type: new Abstract: Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base mod…
arXiv cs.CL TIER_1 · Viktor Prasanna · 2026-05-07 13:25

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains. In this work, we ask: if RL…
arXiv cs.CL TIER_1 · Zlatan Ajanovic · 2026-05-07 11:28

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

Although advances such as chain-of-thought, tree-of-thought or reinforcement learning have improved the performance of LLMs in reasoning and planning tasks, they are still brittle and have not achieved human-level performance in many domains, and often suffer from high time and t…
arXiv cs.LG TIER_1 · Yiming Huang, Zhenbo Shi, Xin-Cheng Wen, Jichuan Zeng, Cuiyun Gao, Peiyi Han, Chuanyi Liu · 2026-05-07 04:00

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

arXiv:2605.04065v1 Announce Type: cross Abstract: Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the mo…
arXiv cs.LG TIER_1 · Yiming Huang, Zhenbo Shi, Shuzheng Gao, Cuiyun Gao, Peiyi Han, Chuanyi Liu · 2026-05-07 04:00

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

arXiv:2605.04066v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization scheme…
arXiv cs.AI TIER_1 · Ruiqing Zhao, Fengzhi Li, Yuan Zuo, Rui Liu, Yansong Liu, Yunfei Ma, Fanyu Meng, Junlan Feng · 2026-05-06 04:00

Strategy-Aware Optimization Modeling with Reasoning LLMs

arXiv:2605.02545v1 Announce Type: new Abstract: Large language models (LLMs) can generate syntactically valid optimization programs, yet often struggle to reliably choose an effective modeling strategy, leading to incorrect formulations and inefficient solver behavior. We propose…
arXiv cs.CL TIER_1 · Arash Ahmadi (Mike), Sarah Sharif (Mike), Yaser (Mike), Banad · 2026-05-05 04:00

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

arXiv:2605.02073v1 Announce Type: new Abstract: Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive t…
arXiv cs.AI TIER_1 · Junlan Feng · 2026-05-04 12:48

Strategy-Aware Optimization Modeling with Reasoning LLMs

Large language models (LLMs) can generate syntactically valid optimization programs, yet often struggle to reliably choose an effective modeling strategy, leading to incorrect formulations and inefficient solver behavior. We propose SAGE, a strategy-aware framework that makes Mod…
arXiv cs.LG TIER_1 · Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin · 2026-05-04 04:00

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

arXiv:2605.00380v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Ne…
Hugging Face Daily Papers TIER_1 · 2026-05-03 22:01

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives …
arXiv cs.CL TIER_1 · Banad · 2026-05-03 22:01

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives …
arXiv cs.AI TIER_1 · Wilder Baldwin, Sepideh Ghanavati · 2026-05-01 04:00

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

arXiv:2604.27713v1 Announce Type: new Abstract: The risks posed by AI features are increasing as they are rapidly integrated into software applications. In response, regulations and standards for safe and secure AI have been proposed. In this paper, we present an agentic framewor…
arXiv cs.CL TIER_1 · Garvin Kruthof · 2026-05-01 04:00

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

arXiv:2604.28031v1 Announce Type: new Abstract: When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted sci…
arXiv cs.AI TIER_1 · Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li · 2026-05-01 04:00

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

arXiv:2604.28056v1 Announce Type: new Abstract: Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focused primarily on generating, evol…
arXiv cs.CL TIER_1 · Byeongjin Kim, Gyuwan Kim, Seo Yeon Park · 2026-05-01 04:00

PPA-Plan: Proactive Pitfall Avoidance for Reliable Planning in Long-Context LLM Reasoning

arXiv:2601.11908v2 Announce Type: replace Abstract: Large language models (LLMs) struggle with reasoning over long contexts where relevant information is sparsely distributed. Although plan-and-execute frameworks mitigate this by decomposing tasks into planning and execution, the…
arXiv cs.CL TIER_1 · Guojun Yin · 2026-05-01 03:57

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this …
arXiv cs.AI TIER_1 · Hui Li · 2026-04-30 16:01

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focused primarily on generating, evolving, or selecting reward candidates, while payi…
arXiv cs.CL TIER_1 · Garvin Kruthof · 2026-04-30 15:46

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark …
Hugging Face Daily Papers TIER_1 · 2026-04-30 15:46

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark …
arXiv cs.AI TIER_1 · Sepideh Ghanavati · 2026-04-30 10:57

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

The risks posed by AI features are increasing as they are rapidly integrated into software applications. In response, regulations and standards for safe and secure AI have been proposed. In this paper, we present an agentic framework that constructs knowledge graphs (KGs) from AI…
arXiv cs.CL TIER_1 · Zhenyu Zhao, Sander Land, Dan Bikel, Waseem Alshikh · 2026-04-30 04:00

Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

arXiv:2604.26355v1 Announce Type: new Abstract: Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low…
arXiv cs.CL TIER_1 · Waseem Alshikh · 2026-04-29 07:06

Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low-entropy \textit{structural} tokens (recurring p…
arXiv cs.LG TIER_1 · Bojie Li · 2026-04-29 04:00

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

arXiv:2604.24827v1 Announce Type: new Abstract: Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exp…
arXiv cs.CL TIER_1 · James Pustejovsky, Nikhil Krishnaswamy · 2026-04-29 04:00

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

arXiv:2604.25136v1 Announce Type: new Abstract: We propose Frictive Policy Optimization (FPO), a framework for learning language model policies that regulate not only what to say, but when and how to intervene in order to manage epistemic and normative risk. Unlike standard align…
arXiv cs.CL TIER_1 · Xiang Liu, Xuming Hu, Xiaowen Chu, Eunsol Choi · 2026-04-29 04:00

DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference

arXiv:2510.19669v4 Announce Type: replace Abstract: Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach h…
arXiv cs.CL TIER_1 · Liaoyaqi Wang, Chunsheng Zuo, William Jurayj, Benjamin Van Durme, Anqi Liu · 2026-04-28 04:00

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

arXiv:2604.23333v1 Announce Type: cross Abstract: Scaling test-time computation with reinforcement learning (RL) has emerged as a reliable path to improve large language models (LLM) reasoning ability. Yet, outcome-based reward often incentivizes models to be overconfident, leadi…
arXiv cs.CL TIER_1 · Tomer Ashuach, Shai Gretz, Yoav Katz, Yonatan Belinkov, Liat Ein-Dor · 2026-04-28 04:00

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness

arXiv:2604.12373v5 Announce Type: replace Abstract: Humans use introspection to evaluate their understanding through private internal states inaccessible to external observers. We investigate whether large language models possess similar privileged knowledge about answer correctn…
arXiv cs.CL TIER_1 · Alexis Limozin, Eduard Durech, Torsten Hoefler, Imanol Schlag, Valentina Pyatkin · 2026-04-28 04:00

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning

arXiv:2604.23747v1 Announce Type: cross Abstract: Recent mixed-policy optimization methods for LLM reasoning that interleave or blend supervised and reinforcement learning signals report improvements over the standard SFT-then-RL pipeline. We show that numerous recently published…
arXiv cs.AI TIER_1 · Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Qiang Lin, Hande Dong, Jiawei Chen · 2026-04-28 04:00

Scheduling Your LLM Reinforcement Learning with Reasoning Trees

arXiv:2510.24832v2 Announce Type: replace Abstract: Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be conceptualized as progressively editing a query's "Reasoning Tree'. This process involves exploring nodes (tokens) and d…
arXiv cs.AI TIER_1 · Kaiyang Wan, Lang Gao, Honglin Mu, Preslav Nakov, Yuxia Wang, Xiuying Chen · 2026-04-28 04:00

A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA

arXiv:2509.21199v3 Announce Type: replace Abstract: Multi-Hop Question Answering (MHQA) requires integrating dispersed, interdependent evidence through sequential reasoning under noise. This task is challenging for LLMs as they have a finite per-pass output capacity, beyond which…
arXiv cs.AI TIER_1 · Shuxu Chen, Yitian Zhou, Jiaquan Zhang, Haoyu Bian, Aming Wu, Sungyoung Lee, Chaoning Zhang, Hyundong Shin · 2026-04-28 04:00

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

arXiv:2604.23270v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on long, multi-step problems, leading …
arXiv cs.AI TIER_1 · Junyan Cheng, Kyle Richardson, Peter Chin · 2026-04-28 04:00

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

arXiv:2604.23072v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instability and lacks a verifiable, compo…
arXiv cs.CL TIER_1 · Qibin Wang, Pu Zhao, Shaohan Huang, Fangkai Yang, Lu Wang, Furu Wei, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang · 2026-04-28 04:00

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

arXiv:2509.00084v2 Announce Type: replace-cross Abstract: Test-time scaling (TTS) has gained widespread attention for enhancing LLM reasoning. Existing approaches such as Best-of-N and majority voting are limited as their performance depends on the quality of candidate responses,…
arXiv cs.CL TIER_1 · Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu · 2026-04-28 04:00

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

arXiv:2508.01191v5 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting has been shown to be effective in eliciting structured reasoning (i.e., CoT reasoning) from large language models (LLMs). Regardless of its popularity, recent studies expose its failures in…
arXiv cs.CL TIER_1 · Nikhil Krishnaswamy · 2026-04-28 02:24

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

We propose Frictive Policy Optimization (FPO), a framework for learning language model policies that regulate not only what to say, but when and how to intervene in order to manage epistemic and normative risk. Unlike standard alignment methods that optimize surface-level prefere…
arXiv cs.LG TIER_1 · Md Muntaqim Meherab, Noor Islam S. Mohammad, Faiza Feroz · 2026-04-27 04:00

Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

arXiv:2603.10377v2 Announce Type: replace Abstract: Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over sparse, interpretable latent f…
arXiv cs.LG TIER_1 · Yigit Ihlamur · 2026-04-23 12:05

CoFEE: Reasoning Control for LLM-Based Feature Discovery

Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and post-outcome signals. With the introduction of ever-improving Large Language Mo…
arXiv cs.CL TIER_1 · Nicholas Kluge Corrêa · 2026-04-23 09:13

Reasoning Primitives in Hybrid and Non-Hybrid LLMs

Reasoning in large language models is often treated as a monolithic capability, but its observed gains may arise from more basic operations. We study reasoning through two such primitives, recall and state-tracking, and ask whether hybrid architectures that combine attention-base…
Ahead of AI (Sebastian Raschka) TIER_1 · Sebastian Raschka, PhD · 2026-01-24 11:23

Categories of Inference-Time Scaling for Improved LLM Reasoning

And an Overview of Recent Inference-Scaling Papers
Ahead of AI (Sebastian Raschka) TIER_1 · Sebastian Raschka, PhD · 2025-04-19 11:02

The State of Reinforcement Learning for LLM Reasoning

Understanding GRPO and New Insights from Reasoning Model Papers
Ahead of AI (Sebastian Raschka) TIER_1 · Sebastian Raschka, PhD · 2025-03-08 12:11

The State of LLM Reasoning Model Inference

Inference-Time Compute Scaling Methods to Improve Reasoning Models
Ahead of AI (Sebastian Raschka) TIER_1 · Sebastian Raschka, PhD · 2025-02-05 12:11

Understanding Reasoning LLMs

Methods and Strategies for Building and Refining Reasoning Models
arXiv stat.ML TIER_1 · Patrick Rebeschini · 2026-04-20 15:38

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

Large language models (LLMs) using chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or aft…
Smol AINews TIER_1 · 2025-07-08 05:44

SmolLM3: the SOTA 3B reasoning open source LLM

**HuggingFace** released **SmolLM3-3B**, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until **Olmo 3** arrives. **Grok 4** was launched with mixed reactions, while concerns about **Claude 4** nerfs and a…

COVERAGE [50]

RELATED ENTITIES

RELATED TOPICS