LLM inference and reasoning techniques advance with new research and hardware

OpenAI News TIER_1 English(EN) · 2026-06-24 06:00

OpenAI and Broadcom unveil LLM-optimized inference chip

OpenAI and Broadcom introduce Jalapeño, a custom AI chip built for LLM inference to improve performance, efficiency, and scale across AI systems.

Google AI / Research TIER_1 English(EN) · 2026-03-04 20:29

Teaching LLMs to reason like Bayesians

Generative AI

Google AI / Research TIER_1 English(EN) · 2025-09-11 22:01

Speculative cascades — A hybrid approach for smarter, faster LLM inference

Generative AI

arXiv cs.CL TIER_1 English(EN) · \'Ad\'am Kov\'acs, Nadia Verdha, G\'abor Recski · 2026-07-03 04:00

RuleChef: Grounding LLM Task Knowledge in Human-Editable Rules

arXiv:2607.01293v1 Announce Type: new Abstract: We present RuleChef, a framework that uses large language models (LLMs) to generate executable rules for NLP tasks such as text classification, Named Entity Recognition (NER), or relation extraction. Rules are generated based on a t…

arXiv cs.AI TIER_1 English(EN) · Tingting Yu, Pei-Cing Huang, Chan Hsu, Chan-Tung Ku, Yihuang Kang · 2026-07-03 04:00

ADVENT: LLM-Driven Automatic Predicate Invention for ILP

arXiv:2607.01585v1 Announce Type: cross Abstract: Predicate invention (PI), the creation of new predicates to extend the hypothesis space, remains a critical bottleneck in Inductive Logic Programming (ILP). Existing methods rely on domain expertise and produce semantically opaque…

arXiv cs.AI TIER_1 English(EN) · Samir Abdaljalil, Erchin Serpedin, Hasan Kurban · 2026-07-03 04:00

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

arXiv:2607.01431v1 Announce Type: cross Abstract: We introduce ISOSCI, a benchmark of isomorphic cross-domain science problem pairs that separates reasoning ability from domain knowledge retrieval in LLM evaluation. Each pair shares identical logical structure but requires differ…

arXiv cs.AI TIER_1 English(EN) · Yanjun Zhao, Ruizhong Qiu, Tianxin Wei, Yuanchen Bei, Zhining Liu, Lingjie Chen, Ismini Lourentzou, Hanghang Tong, Jingrui He · 2026-07-03 04:00

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

arXiv:2607.02509v1 Announce Type: new Abstract: Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-07-02 17:59

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in th…

arXiv cs.AI TIER_1 English(EN) · Jingrui He · 2026-07-02 17:59

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in th…

arXiv cs.AI TIER_1 English(EN) · Woosung Koh, Juyoung Suk, Sungjun Han, Se-Young Yun, Jamin Shin · 2026-07-02 04:00

Predicting LLM Reasoning Performance with Small Proxy Model

arXiv:2509.21013v4 Announce Type: replace-cross Abstract: Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize datasets before scaling up. However, this approach becomes challenging for reasoning capabiliti…

arXiv cs.AI TIER_1 English(EN) · Fatima Jahara, Mark Dredze, Sharon Levy · 2026-07-02 04:00

Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

arXiv:2511.06160v2 Announce Type: replace Abstract: While recent safety guardrails effectively suppress overtly biased outputs, subtler forms of social bias emerge during complex logical reasoning tasks that evade current evaluation benchmarks. To fill this gap, we introduce a ne…

arXiv cs.CL TIER_1 English(EN) · Yao Dou, Benjamin Mamut, Wei Xu · 2026-07-02 04:00

Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization

arXiv:2601.04424v2 Announce Type: replace Abstract: Large language models (LLMs) now support contexts of up to 1M tokens, but their strengths and weaknesses on complex long-context tasks remain unclear. To study this, we focus on multi-document legal case summarization, where a s…

arXiv cs.CL TIER_1 English(EN) · Yujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, Moritz M\"uller, Simon Razniewski · 2026-07-02 04:00

GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge

arXiv:2507.05740v2 Announce Type: replace Abstract: Language models are powerful artifacts, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlink…

arXiv cs.CL TIER_1 English(EN) · Yihuang Kang · 2026-07-02 01:33

ADVENT: LLM-Driven Automatic Predicate Invention for ILP

Predicate invention (PI), the creation of new predicates to extend the hypothesis space, remains a critical bottleneck in Inductive Logic Programming (ILP). Existing methods rely on domain expertise and produce semantically opaque predicates, hindering adaptation to unfamiliar do…

arXiv cs.CL TIER_1 English(EN) · Hasan Kurban · 2026-07-01 19:49

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

We introduce ISOSCI, a benchmark of isomorphic cross-domain science problem pairs that separates reasoning ability from domain knowledge retrieval in LLM evaluation. Each pair shares identical logical structure but requires different domain-specific knowledge, enabling controlled…

arXiv cs.AI TIER_1 English(EN) · Zhaoyang Luo, Runmin Dong, Miao Yang, Fan Wei, Yushan Lai, Bin Luo, Haohuan Fu · 2026-07-01 04:00

Attend, Transform, or Silence: Operator-Level Visual Skipping for Efficient Multimodal LLM Inference

arXiv:2606.31903v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) increasingly process long visual-token sequences, increasing the overall inference computation. Existing acceleration methods usually remove visual tokens or skip visual-token updates in en…

arXiv cs.LG TIER_1 English(EN) · Hongmin Li · 2026-07-01 04:00

Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol

arXiv:2605.11599v3 Announce Type: replace Abstract: Fixed reasoning benchmarks evaluate canonical prompts, but semantically valid changes in presentation can still change model behavior. Studies of prompt variation can reveal such failures, but without audit they can mix genuine …

arXiv cs.CL TIER_1 English(EN) · Xudong Shen, Li Yuan, Ye Chen, Xin Wu, Yi Cai, Zhiyong Wu · 2026-07-01 04:00

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

arXiv:2606.31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can identify…

arXiv cs.AI TIER_1 English(EN) · Zijun Di, Bin Lu, Huquan Kang, Luoyi Fu, Jiaxin Ding, Xiaoying Gan, Lei Zhou, Xinbing Wang · 2026-07-01 04:00

Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression

arXiv:2601.08187v3 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated promising capabilities in Text-Attributed Graph (TAG) understanding. Recent studies typically focus on verbalizing the graph structures via handcrafted prompts, feeding the target n…

arXiv cs.AI TIER_1 English(EN) · Ankur Samanta, Akshayaa Magesh, Tal Lancewicki, Ayush Jain, Youliang Yu, Paul Sajda, Kaveh Hassani, Aditya Modi, Daniel R. Jiang, Yonathan Efroni · 2026-07-01 04:00

BayesBench: Evaluating LLM Belief Trajectories Under Multi-Turn Evidence Accumulation

arXiv:2606.30850v1 Announce Type: new Abstract: Large language models (LLMs) are typically deployed in multi-turn conversations, where each turn provides new evidence that should reduce epistemic uncertainty about their environment. Acting rationally then requires inferring the u…

arXiv cs.AI TIER_1 English(EN) · Chao Wang, Hongtao Tian, Tao Yang, Yunsheng Shi, Ting Yao, Wenbo Ding · 2026-06-30 04:00

Process Advantage Signal Shaping: A Paradigm-Agnostic Middleware for Process-Supervised RL in LLM Reasoners

arXiv:2606.29296v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) is a default recipe for process-supervised reinforcement learning of LLM reasoners, and dense process supervision -- via learned process reward models (PRMs) or on-policy-distillation KL sig…

arXiv cs.LG TIER_1 English(EN) · Pei-Chi Pan, Yingbin Liang, Sen Lin · 2026-06-30 04:00

Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation

arXiv:2602.09305v2 Announce Type: replace Abstract: Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)-based fine-tuning is a key mechanism for improvement, but its effectiveness …

arXiv cs.LG TIER_1 English(EN) · Jinda Lu, Kexin Huang, Junkang Wu, Shuo Yang, Jinghan Li, Chiyu Ma, Shaohang Wei, Xiang Wang, Guoyin Wang, Jingren Zhou · 2026-06-30 04:00

Experience Augmented Policy Optimization for LLM Reasoning

arXiv:2606.30420v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for improving the reasoning capabilities of large language models (LLMs). However, existing RLVR methods typically rely on on-policy optimization from scra…

arXiv cs.CL TIER_1 English(EN) · Jiaqi Li, Fanghui Song · 2026-06-30 04:00

Grounding LLM Reasoning under Incomplete Graph Evidence

arXiv:2606.30247v1 Announce Type: new Abstract: Knowledge graphs can guide large language models (LLMs) reasoning, but the graph seen by a system is usually a retrieved, linked, temporally scoped, and incomplete evidence state rather than a complete account of truth. We develop a…

arXiv cs.AI TIER_1 English(EN) · Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, Yuzhe Yang · 2026-06-30 04:00

HEARTS: Benchmarking LLM Reasoning on Health Time Series

arXiv:2603.06638v3 Announce Type: replace-cross Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to general-purpose reasoning. Yet, existing benchmarks cover only a small set of health time series modalities and tasks, fail…

arXiv cs.AI TIER_1 English(EN) · Tiancheng Xing, Jerry Li, Yixuan Du, Xiyang Hu · 2026-06-30 04:00

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

arXiv:2510.06732v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used as rerankers in information retrieval, yet their ranking behavior can be steered by small, natural-sounding prompts. To expose this vulnerability, we present Rank Anything…

arXiv cs.AI TIER_1 English(EN) · Maohao Ran, Zhenglin Wan, Cooper Lin, Yanting Zhang, Hongyu Xin, Hongwei Fan, Yibo Xu, Beier Luo, Yaxin Zhou, Wangbo Zhao, Lijie Yang, Lang Feng, Fuchao Yang, Jingxuan Wu, Yiqiao Huang, Chendong Ma, Yusen Huang, Dailing Jiang, Jianbo Deng, Sirui Han, Yan… · 2026-06-30 04:00

CaveAgent: Transforming LLMs into Stateful Runtime Operators

arXiv:2601.01569v4 Announce Type: replace Abstract: LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms that struggle with long-horizon tasks due to fragile multi-turn dependencies and conte…

arXiv cs.AI TIER_1 English(EN) · Marco Aruta, Francesco Improta, Vadim Malvone, Aniello Murano, Vladana Perlic · 2026-06-30 04:00

Translating Natural Language to Strategic Temporal Specifications via LLMs

arXiv:2606.30441v1 Announce Type: cross Abstract: A rigorous formalization of system requirements is a fundamental prerequisite for the verification of Multi-Agent Systems (MAS). However, writing correct formal specifications is well known as an error-prone, time-consuming, and e…

arXiv cs.AI TIER_1 English(EN) · Xiteng Yao, Taeho Kim, Hengzhi Pei, Xinle Liu, Kyle Ulrich, Leonard Lausen, Ashish Khetan, Xiang Song, George Karypis, Martin Herbordt · 2026-06-30 04:00

KernelSight-LM: A Kernel-Level LLM Inference Simulator

arXiv:2606.28565v1 Announce Type: cross Abstract: As large language models (LLMs) move into production serving, practitioners must rapidly evaluate inference performance across diverse hardware, models, and serving parameters to meet cost and latency targets. However, the end-to-…

arXiv cs.AI TIER_1 English(EN) · Jingyao Liu, Danling Meng, Chen Huang, Yukun Yan, Zhenghao Liu, Wenqiang Lei, See-Kiong Ng, Maosong Sun · 2026-06-30 04:00

HippoSpark: An On-Demand Experience System for LLM Reasoning

arXiv:2606.29929v1 Announce Type: new Abstract: Distilling historical trajectories into reusable experience to enhance future problem-solving has become a focal point of recent LLM research. However, existing methods predominantly operate at the task level, leveraging general sum…

arXiv cs.AI TIER_1 English(EN) · Tianlong Wang, Yuhang Wang, Weibin Liao, Xin Gao, Xinyu Ma, Yang Lin, Yasha Wang, Liantao Ma · 2026-06-30 04:00

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

arXiv:2606.28589v1 Announce Type: new Abstract: Current approaches to enhance Large Language Model (LLM) reasoning, such as Chain-of-Thought and "Wait" prompts, primarily encourage models to think more, yet often fail to guide them toward Truth. While Representation Editing (RepE…

arXiv cs.CL TIER_1 English(EN) · Zhiyong Wu · 2026-06-30 02:17

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can identify or classify fallacies, leaving their robustness…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Vladana Perlić · 2026-06-29 15:15

Translating Natural Language to Strategic Temporal Specifications via LLMs

A rigorous formalization of system requirements is a fundamental prerequisite for the verification of Multi-Agent Systems (MAS). However, writing correct formal specifications is well known as an error-prone, time-consuming, and expertise-intensive task. This difficulty is furthe…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Vladana Perlic · 2026-06-29 15:15

Translating Natural Language to Strategic Temporal Specifications via LLMs

A rigorous formalization of system requirements is a fundamental prerequisite for the verification of Multi-Agent Systems (MAS). However, writing correct formal specifications is well known as an error-prone, time-consuming, and expertise-intensive task. This difficulty is furthe…

arXiv cs.LG TIER_1 English(EN) · Jingren Zhou · 2026-06-29 15:05

Experience Augmented Policy Optimization for LLM Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for improving the reasoning capabilities of large language models (LLMs). However, existing RLVR methods typically rely on on-policy optimization from scratch, resulting in high sampling costs and ineffi…

arXiv cs.CL TIER_1 English(EN) · Fanghui Song · 2026-06-29 12:56

Grounding LLM Reasoning under Incomplete Graph Evidence

Knowledge graphs can guide large language models (LLMs) reasoning, but the graph seen by a system is usually a retrieved, linked, temporally scoped, and incomplete evidence state rather than a complete account of truth. We develop a theoretical perspective on grounding observable…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-29 12:56

Grounding LLM Reasoning under Incomplete Graph Evidence

Knowledge graphs can guide large language models (LLMs) reasoning, but the graph seen by a system is usually a retrieved, linked, temporally scoped, and incomplete evidence state rather than a complete account of truth. We develop a theoretical perspective on grounding observable…

arXiv cs.AI TIER_1 English(EN) · Yuzhe Wang, Yaochen Zhu, Jundong Li · 2026-06-29 04:00

CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching

arXiv:2602.20094v2 Announce Type: replace Abstract: As large language models (LLMs) witness increasing deployment in complex, high-stakes decision-making scenarios, it becomes imperative to ground their reasoning in causality rather than spurious correlations. However, strong per…

arXiv cs.AI TIER_1 English(EN) · Yuhang Chen, Jinhao Duan, Ruichen Zhang, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Parish Aggarwal, Frank Shyu, Luke Simon, Sandeep Pandey, Tianlong Chen, Xi Liu · 2026-06-29 04:00

End-to-End Dynamic Sparsity for Resource-Adaptive LLM Inference

arXiv:2606.27743v1 Announce Type: cross Abstract: Large Language Models (LLMs) inference is typically deployed under a static resource assumption, where models execute a fixed computational graph regardless of the runtime environment. However, real-world cloud infrastructure is i…

arXiv cs.CL TIER_1 English(EN) · Carrie Chen · 2026-06-29 04:00

EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction

arXiv:2606.27550v1 Announce Type: new Abstract: Multi-token prediction has been shown to increase data density during training, improve downstream text-generation quality, and serves as the defacto approach for self-speculative decoding. Existing foundation and open source models…

arXiv cs.AI TIER_1 English(EN) · Yiheng Tao, Yihe Zhang, Matthew Dearing, Xin Wang, Yuping Fan, Michael E. Papka, Zhiling Lan · 2026-06-29 04:00

Ranking Before Serving: Low-Latency LLM Serving via Pairwise Learning-to-Rank

arXiv:2510.03243v3 Announce Type: replace-cross Abstract: Efficient scheduling of large language model (LLM) inference tasks is critical for achieving low latency and high throughput, a challenge that is becoming increasingly acute with the rise of reasoning-capable LLMs whose ge…

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Xi Liu · 2026-06-26 05:48

End-to-End Dynamic Sparsity for Resource-Adaptive LLM Inference

Large Language Models (LLMs) inference is typically deployed under a static resource assumption, where models execute a fixed computational graph regardless of the runtime environment. However, real-world cloud infrastructure is inherently dynamic, characterized by fluctuating av…

arXiv cs.AI TIER_1 English(EN) · Derek Thomas · 2026-06-26 04:00

Context Recycling for Long-Horizon LLM Inference

arXiv:2606.26105v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce ContextFo…

arXiv cs.CL TIER_1 English(EN) · Jinghan Wang, Yanjun Chen, Wei Zhang, Xiaotong Huang, Tianchen Liu, Gaoliang Peng · 2026-06-26 04:00

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

arXiv:2606.26861v1 Announce Type: new Abstract: Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimati…

arXiv cs.AI TIER_1 English(EN) · Haoqian Meng, Yilun Luo, Yafei Zhao, Wenyuan Liu, Huaqing Zheng, Xindian Ma, Peng Zhang · 2026-06-26 04:00

SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference

arXiv:2606.26587v1 Announce Type: cross Abstract: Low-bit floating-point formats and semi-structured sparsity are increasingly supported by modern accelerators, yet combining them for LLM activation compression remains challenging: activations contain input-dependent outliers tha…

arXiv cs.CL TIER_1 English(EN) · Carrie Chen · 2026-06-25 20:54

EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction

Multi-token prediction has been shown to increase data density during training, improve downstream text-generation quality, and serves as the defacto approach for self-speculative decoding. Existing foundation and open source models that use MTP heads commit to a static tree-base…

arXiv cs.CL TIER_1 English(EN) · Gaoliang Peng · 2026-06-25 10:44

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimation, and their cross-architecture behavior remain…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-25 10:44

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimation, and their cross-architecture behavior remain…

arXiv cs.LG TIER_1 English(EN) · Peng Zhang · 2026-06-25 04:19

SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference

Low-bit floating-point formats and semi-structured sparsity are increasingly supported by modern accelerators, yet combining them for LLM activation compression remains challenging: activations contain input-dependent outliers that dominate block scales in FP4 quantization, and d…

arXiv cs.LG TIER_1 English(EN) · DatologyAI, :, Matthew L. Leavitt, Siddharth Joshi, Haoli Yin, Rishabh Adiga, Haakon Mongstad, Alvin Deng, David Schwab, Bogdan Gaza, Ari Morcos · 2026-06-25 04:00

Brevity is the Soul of Inference Efficiency: Inducing Concision in VLMs via Data Curation

arXiv:2606.25432v1 Announce Type: new Abstract: Inference efficiency is typically pursued by shrinking the model: distillation, pruning, quantization, and sparse routing each lower per-token cost while treating token count as fixed. But output length has been inflating, and it is…

arXiv cs.LG TIER_1 English(EN) · Stefan Wahl, Raphaela Schenk, Ali Farnoud, Jakob H. Macke, Daniel Gedon · 2026-06-25 04:00

A Probabilistic Framework for LLM-Based Model Discovery

arXiv:2602.18266v2 Announce Type: replace Abstract: Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-style iterative workflows that re…

arXiv cs.CL TIER_1 English(EN) · Jaeyong Ko, Pilsung Kang, Yukyung Lee · 2026-06-25 04:00

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk, or…

arXiv cs.LG TIER_1 English(EN) · Francisco Ferreira da Silva, Stefan Heimersheim · 2026-06-25 04:00

Evidence for feature-specific error correction in LLMs

arXiv:2606.24964v1 Announce Type: new Abstract: Understanding the features of large language models (LLMs) is a central goal of interpretability. LLMs are commonly assumed to use superposition to represent more features than they have dimensions. They may not only represent featu…

arXiv cs.AI TIER_1 English(EN) · Yukyung Lee · 2026-06-24 08:03

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk, or sentence level, or at tokens where failure has al…

arXiv cs.AI TIER_1 English(EN) · Ari Morcos · 2026-06-24 05:50

Brevity is the Soul of Inference Efficiency: Inducing Concision in VLMs via Data Curation

Inference efficiency is typically pursued by shrinking the model: distillation, pruning, quantization, and sparse routing each lower per-token cost while treating token count as fixed. But output length has been inflating, and it is precisely the component the standard toolkit le…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-24 05:50

Brevity is the Soul of Inference Efficiency: Inducing Concision in VLMs via Data Curation

Inference efficiency is typically pursued by shrinking the model: distillation, pruning, quantization, and sparse routing each lower per-token cost while treating token count as fixed. But output length has been inflating, and it is precisely the component the standard toolkit le…

arXiv cs.AI TIER_1 English(EN) · Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud · 2026-06-24 04:00

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

arXiv:2507.01752v4 Announce Type: replace-cross Abstract: Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying d…

arXiv cs.LG TIER_1 English(EN) · Bohua Zou, Nian Liu, Binqi Sun, Matteo Mascherin, Debayan Roy, Yutao Liu, Yu Peng, Ning Jia, Haibo Chen · 2026-06-24 04:00

EnerInfer: Energy-Aware On-Device LLM Inference

arXiv:2606.23001v1 Announce Type: cross Abstract: On-device LLM inference is increasingly attractive for privacy-preserving, reliable, and cost-effective deployment, yet its energy and thermal costs remain a critical bottleneck. Existing systems primarily optimize for decoding sp…

arXiv cs.CL TIER_1 English(EN) · Quan Xiao, Yutong Xuan, Gaowen Liu, Ramana Rao Kompella, Tianyi Chen · 2026-06-24 04:00

Bilevel Data Curation for LLM Fine-tuning: Offline Selection and Online Self-Refining Generation

arXiv:2511.21056v2 Announce Type: replace-cross Abstract: Supervised fine-tuning (SFT) datasets are critical to the downstream performance of large language models, yet they often contain low-quality or harmful question-response pairs. To improve SFT data quality, we develop a un…

arXiv cs.AI TIER_1 English(EN) · Zijin Hong, Hao Wu, Su Dong, Junnan Dong, Yilin Xiao, Yujing Zhang, Zhu Wang, Feiran Huang, Linyi Li, Hongxia Yang, Xiao Huang · 2026-06-24 04:00

Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions

arXiv:2501.11790v5 Announce Type: replace-cross Abstract: Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliab…

arXiv cs.AI TIER_1 English(EN) · Yucheng Wu, Jundong Xu, Mingzhen Ju, Yue Yu, Chenpeng Wang, Haoxuan Li, Liangming Pan · 2026-06-24 04:00

HOLMES: Evaluating Higher-Order Logical Reasoning in LLMs

arXiv:2606.23238v2 Announce Type: replace Abstract: Logical reasoning is essential for reliable AI, yet existing benchmarks are largely first-order-logic-centric, focusing on object-level deduction over fixed predicates. This misses many realistic scenarios where models must reas…

arXiv cs.AI TIER_1 English(EN) · Tianbao Ma, Chang Xi, Yichuan Zou, Chengen Li, Linxun Chen, Zilong Lu, Yanan Niu, Zhaojie Liu, Han Li, Kun Gai · 2026-06-24 04:00

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

arXiv:2606.24605v1 Announce Type: new Abstract: Accurate user modeling often depends on rich interaction histories, which are unavailable for billions of low-activity users. Large Language Models (LLMs) can infer latent user states from static profiles, but this reasoning becomes…

arXiv cs.AI TIER_1 English(EN) · Xiaolin Lin, Jingcun Wang, Olga Kondrateva, Yiyu Shi, Bing Li, Grace Li Zhang · 2026-06-24 04:00

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

arXiv:2606.24467v1 Announce Type: new Abstract: Long-context large language model (LLM) inference is increasingly constrained by the memory footprint and decoding cost of key-value (KV) caches, limiting sustainable deployment on resource-constrained hardware. Existing KV cache ev…

arXiv cs.AI TIER_1 English(EN) · Kun Gai · 2026-06-23 14:05

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

Accurate user modeling often depends on rich interaction histories, which are unavailable for billions of low-activity users. Large Language Models (LLMs) can infer latent user states from static profiles, but this reasoning becomes unreliable when profiles are sparse, and applyi…

arXiv cs.AI TIER_1 English(EN) · Grace Li Zhang · 2026-06-23 11:59

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

Long-context large language model (LLM) inference is increasingly constrained by the memory footprint and decoding cost of key-value (KV) caches, limiting sustainable deployment on resource-constrained hardware. Existing KV cache eviction methods typically apply heuristic token s…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-23 11:59

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

Long-context large language model (LLM) inference is increasingly constrained by the memory footprint and decoding cost of key-value (KV) caches, limiting sustainable deployment on resource-constrained hardware. Existing KV cache eviction methods typically apply heuristic token s…

Alignment Forum TIER_1 English(EN) · Josh Engels · 2026-06-22 22:26

LLM-Driven Feature Discovery

<p><span>We would often like to get a qualitative sense of a target model’s behaviors in important distributions (e.g. deployment, RL training, or evals). For example, we might want to </span><a href="https://alignment.anthropic.com/2026/petri-v2/"><span>discover novel behaviors<…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-22 16:06

Concordia: JIT-Compiled Persistent-Kernel Checkpointing for Fault-Tolerant LLM Inference

Long-running LLM agents keep valuable state resident on GPUs: KV caches, request schedulers, communication state, and sometimes online adapters. Losing this state after a GPU or communicator failure can discard minutes to hours of work, yet existing recovery mechanisms either res…

arXiv cs.LG TIER_1 English(EN) · Chen Qian · 2026-06-22 16:06

Concordia: JIT-Compiled Persistent-Kernel Checkpointing for Fault-Tolerant LLM Inference

Long-running LLM agents keep valuable state resident on GPUs: KV caches, request schedulers, communication state, and sometimes online adapters. Losing this state after a GPU or communicator failure can discard minutes to hours of work, yet existing recovery mechanisms either res…

arXiv cs.CL TIER_1 English(EN) · Fanghen Li · 2026-06-22 14:19

Do LLM Embedding Spaces Recover Expert Structure?

Pretrained text embeddings are increasingly used as representational maps, yet high category separability does not imply that their geometry recovers expert-defined structure. We study this problem in mental-health-related language, where symptom relations provide an external ref…

arXiv cs.CL TIER_1 English(EN) · Xiangnan He · 2026-06-22 12:58

Towards Root Memories: Benchmarking and Enhancing Implicit Logical Memory Retrieval for Personalized LLMs

Memory systems are essential for personalized Large Language Models (LLMs). However, existing retrieval methods in these systems primarily rely on semantic similarity, potentially missing logically critical memories with limited semantic overlap. Current benchmarks remain inadequ…

arXiv cs.CL TIER_1 English(EN) · Wen Zhang · 2026-06-22 12:50

Scaling LLM Knowledge Boundaries via Distribution-Optimized Synthesis

Knowledge injection via synthetic data is crucial for enhancing Large Language Models (LLMs). However, current synthesis methods simply stop at preset token counts or fixed data ratios, lacking awareness of knowledge distribution. This results in some domains being sparse while o…

arXiv cs.AI TIER_1 English(EN) · Liangming Pan · 2026-06-22 12:23

HOLMES: Evaluating Higher-Order Logical Reasoning in LLMs

Logical reasoning is essential for reliable AI, yet existing benchmarks are largely first-order-logic-centric, focusing on object-level deduction over fixed predicates. This misses many realistic scenarios where models must reason over rules, predicates, functions, constraints, a…

arXiv cs.CL TIER_1 English(EN) · Rada Mihalcea · 2026-06-20 04:12

The Language-Energy Divide: Measuring Energy Costs of Multilingual LLM Inference

Large language models (LLMs) are increasingly deployed in multilingual settings, yet the energy costs of serving these models across different languages remain poorly understood. We present a systematic study of inference energy consumption across languages with ML.Energy framewo…

arXiv cs.CL TIER_1 English(EN) · Joel Stremmel · 2026-06-19 20:23

Denoising Iterative Self-Correction: Structured Verification Loops for Reliable LLM Reasoning

Large language models produce fluent but often incorrect multi-step reasoning, and naive correction methods risk degrading already-correct answers. We introduce Denoising Iterative Self-Correction (DISC), a test-time procedure that treats verification question outputs as noisy me…

arXiv cs.CL TIER_1 English(EN) · Shiguo Lian, Kai Wang, Zhaoxiang Liu, Wen Liu, Minjie Hua, Yutong Liu, Jiangze Yan, Xin Wang, Cong Wang, Yilin Zhang, Yi Shen, Jieyun Huang, Fang Zhao, Huanlin Gao, Ping Chen, Xinyu Yang, Kaikai Zhao, Yao Zhao, Xinggang Wang, Huishuai Zhang, Dongyan Zhao… · 2026-06-19 04:00

Token-Operations-Oriented Inference Optimization Techniques for Large Models

arXiv:2606.20295v1 Announce Type: cross Abstract: Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper pro…

arXiv cs.CL TIER_1 English(EN) · Yu Deng · 2026-06-19 04:00

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

arXiv:2606.19946v1 Announce Type: new Abstract: Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed with…

arXiv cs.CL TIER_1 English(EN) · Jinseok Chung, Minkyoung Song, Hyunji Jung, Namhoon Lee · 2026-06-19 04:00

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

arXiv:2606.19353v1 Announce Type: new Abstract: In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context, …

arXiv cs.LG TIER_1 English(EN) · Abhinit Sen, Ajeet Kumar, Manaranjan Pradhan · 2026-06-19 04:00

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

arXiv:2606.19364v1 Announce Type: new Abstract: The prefill stage of Large Language Model (LLM) inference is a growing contributor to cloud-scale energy cost. Many consumer-support and conversational prompts contain social scaffolding: politeness markers, apologetic preamble, rep…

arXiv cs.AI TIER_1 English(EN) · Xuanzhi Feng, Zhengyang Li, Zeyu Liu, Haoxi Li, Yuming Jiang, Bing Guo, Jingcai Guo, Jie Zhang, Song Guo · 2026-06-19 04:00

Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning

arXiv:2606.19771v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced Large Language Model (LLM) reasoning; however, it faces a fundamental optimization instability: uniform token updates precipitate entropy collapse, lea…

arXiv cs.CL TIER_1 English(EN) · Qinghuai Ma · 2026-06-18 14:33

Token-Operations-Oriented Inference Optimization Techniques for Large Models

Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper proposes for the first time a four-layer technical ar…

arXiv cs.CL TIER_1 English(EN) · Yu Deng · 2026-06-18 08:43

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed without constraints, the model collapses. We show th…

arXiv cs.LG TIER_1 English(EN) · Jiaxing Wang, Deping Xiang, Jin Xu, Zirui Liu, Zicheng Zhang, Guoqiang Gong, Jun Fang, Chao Liu, Pengzhang Liu, Tongxuan Liu, Ke Zhang, Qixia Jiang · 2026-06-18 04:00

BLADE: Scalable Bi-level Adaptive Data Selection for LLM Training

arXiv:2606.18650v1 Announce Type: new Abstract: As Large Language Model (LLM) datasets scale to trillions of tokens, data selection has emerged as a critical frontier to filter out uninformative noise and construct adaptive learning trajectories. Beyond static heuristic filtering…

arXiv cs.LG TIER_1 English(EN) · Yueying Li, Yuanfan Chen, Jiayang Chen, Esha Choukse, Haoran Qiu, G. Edward Suh, Rodrigo Fonseca, Ziv Scully, Udit Gupta · 2026-06-18 04:00

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

arXiv:2606.18431v1 Announce Type: new Abstract: LLM serving exhibits extreme length variability, making size-based scheduling difficult in practice. Recent LLM schedulers approximate SJF/SRPT using predicted decode lengths or ranks and primarily report mean-centric metrics such a…

arXiv cs.AI TIER_1 English(EN) · Yan Scholten, Sophie Xhonneux, Leo Schwinn, Stephan G\"unnemann · 2026-06-18 04:00

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

arXiv:2507.04219v5 Announce Type: replace-cross Abstract: Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fun…

arXiv cs.AI TIER_1 English(EN) · Shabari S Nair, Krishanu Saini · 2026-06-17 04:00

Towards Distributed Inference of LLMs on a P2P Network

arXiv:2606.17059v1 Announce Type: cross Abstract: Prefix caching can reduce LLM inference latency by reusing KV caches across requests with shared prompts, but cluster-scale reuse is challenging because caches are partitioned across nodes. We propose a decentralized, prefix-cache…

arXiv cs.AI TIER_1 English(EN) · Shun Usami, Venkatram Vishwanath, E. Wes Bethel · 2026-06-17 04:00

Prefill/Decode-Aware Evaluation of LLM Inference on Emerging AI Accelerators

arXiv:2606.17104v1 Announce Type: cross Abstract: As large language models (LLMs) are increasingly deployed in latency- and cost-sensitive settings, inference efficiency has become a central systems challenge. While GPUs dominate current deployments, a growing number of AI accele…

arXiv cs.AI TIER_1 English(EN) · Jessica McFadyen, Ole Jorgensen, Harry Coppock, Kevin Wei, Cozmin Ududec · 2026-06-17 04:00

How Inference Compute Shapes Frontier LLM Evaluation

arXiv:2606.17930v1 Announce Type: new Abstract: AI evaluations are shifting toward harder tasks that benefit from longer trajectories involving tool use and iterative problem solving. As a result, performance is increasingly sensitive to the amount and allocation of compute avail…

arXiv cs.LG TIER_1 English(EN) · Md Abdullah Al Mamun, Ngoc Phu Doan, Pedram Zaree, Ihsen Alouani, Nael Abu-Ghazaleh · 2026-06-17 04:00

Loss Landscape Poisoning: Targeted Extraction of Unseen Training Data from LLMs

arXiv:2606.17110v1 Announce Type: cross Abstract: Large Language Models are increasingly trained on proprietary or sensitive data, from private healthcare and financial records to user conversations containing secrets. Ensuring the privacy of such data against extraction attacks …

arXiv cs.CL TIER_1 English(EN) · Dong Huang, Jianbo Sun, Pengkun Yang · 2026-06-17 04:00

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

arXiv:2606.17634v1 Announce Type: new Abstract: Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has beco…

arXiv cs.CL TIER_1 English(EN) · Filip Sondej, Yushi Yang, Adam Mahdi · 2026-06-17 04:00

RepSelect: Robust LLM Unlearning via Representation Selectivity

arXiv:2606.17168v1 Announce Type: new Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-s…

arXiv cs.AI TIER_1 English(EN) · Cozmin Ududec · 2026-06-16 13:40

How Inference Compute Shapes Frontier LLM Evaluation

AI evaluations are shifting toward harder tasks that benefit from longer trajectories involving tool use and iterative problem solving. As a result, performance is increasingly sensitive to the amount and allocation of compute available at test time ("inference compute"). Yet man…

arXiv cs.CL TIER_1 English(EN) · Pengkun Yang · 2026-06-16 07:44

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has become a popular paradigm, in which two responses to…

arXiv cs.AI TIER_1 English(EN) · Ziqun Chen, Ming Wu, Michael Heinrich, Jason Zeng, Huiying Lan, Tianwei Zhang, Rui Tan · 2026-06-16 04:00

Communication-Efficient Verifiable Attention for LLM Inference

arXiv:2606.16352v1 Announce Type: cross Abstract: Computation integrity of remote large language model (LLM) serving can be questionable. For conventional deep neural networks (DNNs), the existing TEE-shielded DNN partitioning (TSDP) approach uses Trusted Execution Environment (T…

arXiv cs.LG TIER_1 English(EN) · Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Dongqi Cai, Zexi Li, Wanru Zhao, Xinchi Qiu, Nicholas D. Lane · 2026-06-16 04:00

Photon: Federated LLM Pre-Training

arXiv:2411.02908v2 Announce Type: replace Abstract: Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like fede…

arXiv cs.LG TIER_1 English(EN) · Yingnan Zhao, Razvan Bunescu, Ahmed Louri, Avinash Karanth, Ke Wang · 2026-06-16 04:00

A Spatio-Temporal Expert Prefetching Framework for Efficient MoE-based LLM Inference

arXiv:2606.15453v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs), such as Qwen and DeepSeek, have recently emerged as an effective approach to improving model capacity without proportionally increasing computational cost. By replacing …

arXiv cs.LG TIER_1 English(EN) · Alexander Yukhimchuk, Andrey Shulga, Mladen Kolar, Martin Tak\'a\v{c} · 2026-06-16 04:00

Privacy from Symmetry: Orthogonally Equivariant Transformers for LLM Inference

arXiv:2606.16461v1 Announce Type: new Abstract: Running large language models locally is often impractical, pushing inference on sensitive text to third-party providers. Split inference partially mitigates this by keeping tokens on the client and sending only hidden representatio…

arXiv cs.CL TIER_1 English(EN) · Joris K\"oster, Zixuan Liu, Siavash Khajavi, Zizhan Zheng · 2026-06-16 04:00

MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference

arXiv:2603.26557v2 Announce Type: replace Abstract: Large Language Models (LLMs) deliver strong performance but incur high inference cost in real-world services, especially under workloads with repeated or near-duplicate queries across users and sessions. In this work, we propose…

arXiv cs.CL TIER_1 English(EN) · Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, Kun Yuan · 2026-06-16 04:00

CentroidKV: Efficient Long-Context LLM Inference via KV Cache Clustering

arXiv:2506.11418v2 Announce Type: replace Abstract: Large language models (LLMs) with extended context windows have become increasingly prevalent for tackling complex tasks. However, the substantial Key-Value (KV) cache required for long-context LLMs poses significant deployment …

arXiv cs.CL TIER_1 English(EN) · Yangjia Hu, Haodong Wang, Zicong Hong, Qianli Liu, Quanxin Shou, Jian Lin, Song Guo, Xiaowei Shen, Xiangjun Huang, Dian Wang, Jian Yang · 2026-06-16 04:00

MosaicQuant: Inlier-Outlier Disaggregation for Unified 4-Bit LLM Quantization

arXiv:2606.15652v1 Announce Type: cross Abstract: 4-bit quantization significantly reduces the memory footprint and accelerates the inference of large language models (LLMs). However, its limited bit-width representation struggles to faithfully capture both dense common values (\…

arXiv cs.AI TIER_1 English(EN) · Jing Ma, Chenhao Dang, Mingjie Liao · 2026-06-16 04:00

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

arXiv:2505.23878v2 Announce Type: replace-cross Abstract: Optimizing pretraining data composition is pivotal for LLM generalization. While dynamic mixing outperforms static strategies by capturing evolving training dynamics, current methods fail to reconcile computational efficie…

arXiv cs.AI TIER_1 English(EN) · Youngcheon You, Banseok Lee, Minseop Choi, Seonyoung Kim, Hyochan Chong, Changdong Kim, Youngmin Kim, Dongkyu Kim · 2026-06-16 04:00

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

arXiv:2602.05367v3 Announce Type: replace Abstract: Efficient deployment of large language models (LLMs) requires extreme quantization, forcing a critical trade-off between low-bit efficiency and performance. Residual binarization enables hardware-friendly, matmul-free inference …

arXiv cs.AI TIER_1 English(EN) · Yizhen Yao, Qinglin Zhu, Runcong Zhao, Xiangxiang Dai, Yanzheng Xiang, Yulan He, Lin Gui · 2026-06-16 04:00

Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens

arXiv:2606.16847v1 Announce Type: cross Abstract: Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and rema…

arXiv cs.AI TIER_1 English(EN) · Feiyang Chen, Haibo Chen · 2026-06-16 04:00

SMEPilot: Characterizing and Optimizing LLM Inference with Scalable Matrix Extensions

arXiv:2606.16332v1 Announce Type: cross Abstract: Modern CPUs increasingly integrate matrix extensions, such as Arm Scalable Matrix Extension (SME), that provide high-throughput matrix execution within the CPU. For LLM inference, however, these units are not a universal replaceme…

arXiv cs.AI TIER_1 English(EN) · Jinlong Yang · 2026-06-16 04:00

Heteroskedastic Signals in Budgeted LLM Verification: Structural Heterogeneity Limits Optimization Gains

arXiv:2606.15841v1 Announce Type: new Abstract: Large language model (LLM) systems increasingly use uncertainty signals to allocate limited computation across verification, test-time scaling, tool execution, and other selective-compute decisions. Such policies rely on a \emph{glo…

arXiv cs.CL TIER_1 English(EN) · Adam Mahdi · 2026-06-15 18:06

RepSelect: Robust LLM Unlearning via Representation Selectivity

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is on…

arXiv cs.AI TIER_1 English(EN) · Lin Gui · 2026-06-15 15:23

Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens

Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and remasking tokens, they typically operate within a mixe…

arXiv cs.AI TIER_1 English(EN) · Anas Nassar, Steve Mohr, Leonard Apanasevich, Himanshu Sharma · 2026-06-15 04:00

STREAM: Multi-Tier LLM Inference Middleware with Dual-Channel HPC Token Streaming

arXiv:2606.13968v1 Announce Type: cross Abstract: Researchers and practitioners working with large language models face a fragmented landscape: local models are free and private but hardware limits the model size and context windows a researcher can use; institutional HPC centers…

arXiv cs.AI TIER_1 English(EN) · Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fang Dong, Anrui Chen, Ruijun Huang, Xin Zhang, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Robert P. Dick, Yuan Cheng, Tun Lu, Fan Yang, Yixuan Chen, Li Shang · 2026-06-15 04:00

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

arXiv:2603.10444v2 Announce Type: replace-cross Abstract: FP4 training promises substantial memory and compute savings for large language models, but remains fragile because blockwise quantization is dictated by extreme activation magnitudes, which inflate dynamic range and compr…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-15 00:00

RepSelect: Robust LLM Unlearning via Representation Selectivity

RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods.

arXiv cs.AI TIER_1 English(EN) · Xucong Wang, Ziyu Ma, Yong Wang, Shidong Yang, Hailang Huang, Renda Li, Pengkun Wang, Xiangxiang Chu · 2026-06-12 04:00

ReSum: Synergizing LLM Reasoning and Summarization with Reinforcement Learning

arXiv:2606.13316v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a central technique for improving long-horizon reasoning in Large Language Models (LLMs). However, existing RLVR methods often encourage unnecessarily long reasoning rollouts,…

arXiv cs.AI TIER_1 English(EN) · Wenbo Chen, Puheng Li, Mengyang Liu, Weijie Su, Tianpei Xie · 2026-06-12 04:00

MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling

arXiv:2606.12935v1 Announce Type: new Abstract: Parallel test-time scaling samples many reasoning traces and majority-votes their answers, improving LLM accuracy but requiring traces to run to completion, incurring substantial computational overhead. We observe that probing parti…

arXiv cs.AI TIER_1 English(EN) · Xiangxiang Chu · 2026-06-11 13:10

ReSum: Synergizing LLM Reasoning and Summarization with Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) is a central technique for improving long-horizon reasoning in Large Language Models (LLMs). However, existing RLVR methods often encourage unnecessarily long reasoning rollouts, which can degrade reasoning coherence and exhau…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 05:56

MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling

Parallel test-time scaling samples many reasoning traces and majority-votes their answers, improving LLM accuracy but requiring traces to run to completion, incurring substantial computational overhead. We observe that probing partial traces at intermediate checkpoints can extrac…

arXiv cs.AI TIER_1 English(EN) · Wesley Pang, Gregory Hyegang Jun, Feiyang Liu, Deming Chen · 2026-06-11 04:00

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

arXiv:2606.11357v1 Announce Type: cross Abstract: With the growing demand for on-device LLM inference, edge SoCs increasingly integrate NPUs to improve performance and energy efficiency under tight power and thermal budgets. However, practical LLM deployment on current client NPU…

arXiv cs.AI TIER_1 English(EN) · Ruxue Shi, Yili Wang, Mengnan Du, Hangting Ye, Yi Chang, Xin Wang · 2026-06-11 04:00

TAROT: Task-Adaptive Refinement of LLM-prior Graphs for Few-shot Tabular Learning

arXiv:2606.11640v1 Announce Type: cross Abstract: Few-shot tabular learning provides a cost-effective approach for real-world applications where annotation is costly and collecting sufficient samples for new tasks is difficult. Existing Traditional and LLM-based methods have demo…

arXiv cs.AI TIER_1 English(EN) · Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan · 2026-06-11 04:00

PoQ-Judge: A Multi-Architecture Evaluation Framework for Cost-Aware Proof-of-Quality in Decentralized LLM Inference

arXiv:2606.11196v1 Announce Type: cross Abstract: Decentralized LLM inference networks need lightweight, reference-free quality evaluation for Proof of Quality (PoQ). We present PoQ-Judge, a framework that trains dedicated judge models to score query-output pairs without ground-t…

arXiv cs.CL TIER_1 English(EN) · Feihu Jin, Shipeng Cen, Ying Tan · 2026-06-11 04:00

Steering the Noise: Turning Random Perturbations into Effective Descent for Memory-Efficient LLM Fine-Tuning

arXiv:2601.04710v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) achieves strong performance but is often limited by the memory overhead of backpropagation. Zeroth-order (ZO) optimization avoids this overhead by estimating gradients through forward pas…

arXiv cs.CL TIER_1 English(EN) · Zhuoyi Peng, Jingzhou Jiang, Hanlin Gu, Lixin Fan, Yi Yang · 2026-06-11 04:00

GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

arXiv:2606.11562v1 Announce Type: cross Abstract: Graph analysis underlies many applications whose answers cannot be looked up in a single record or retrieved along a path: laundering rings, drug repurposing, user preference, and scientific theme are all inferred from a node toge…

arXiv cs.CL TIER_1 English(EN) · Ao Sun · 2026-06-11 04:00

A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs

arXiv:2606.12160v1 Announce Type: new Abstract: In this work, we introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token. Our method extracts a compact set of featur…

arXiv cs.AI TIER_1 English(EN) · Mingyi Luo, Ruichen Zhang, Xiangwang Hou, Jun Du, Chunxiao Jiang, Yong Ren, Shiwen Mao · 2026-06-11 04:00

Resource-Aware LLM Reasoning for Mobile Edge General Intelligence

arXiv:2509.23248v3 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has…

arXiv cs.AI TIER_1 English(EN) · Selen Erkan, Bastian Boll, Kristian Kersting, Bj\"orn Deiseroth, Letitia Parcalabescu · 2026-06-11 04:00

Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation

arXiv:2606.12117v1 Announce Type: cross Abstract: Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the co…

arXiv cs.CL TIER_1 English(EN) · Ao Sun · 2026-06-10 14:48

A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs

In this work, we introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token. Our method extracts a compact set of features such as maximum, minimum, mean, standard devi…

arXiv cs.AI TIER_1 English(EN) · Letitia Parcalabescu · 2026-06-10 14:12

Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation

Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the correct answers but lack the ability -- typically in…

arXiv cs.AI TIER_1 English(EN) · Polydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou · 2026-06-10 04:00

Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs

arXiv:2606.10852v1 Announce Type: cross Abstract: LLM deception is often evaluated through direct markers such as fabricated claims, explicit lies, or strategic concealment. However, many real-world misleading communications do not depend on false statements, rather, they arise f…

arXiv cs.LG TIER_1 English(EN) · Manel Slokom, Malek Slokom, Thierno Kante · 2026-06-10 04:00

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

arXiv:2606.09865v1 Announce Type: new Abstract: Privacy and data sharing are often in tension. Many organizations use synthetic data to reduce privacy risk and still share useful data. For tabular data, auditing privacy remains hard. In many cases, even humans cannot easily tell …

arXiv cs.CL TIER_1 English(EN) · Lena S. Bolliger, Lena A. J\"ager · 2026-06-10 04:00

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

arXiv:2606.10860v1 Announce Type: cross Abstract: Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections a…

arXiv cs.CL TIER_1 English(EN) · Jaeseong Lee, Seung-won Hwang, Samyam Rajbhandari · 2026-06-10 04:00

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

arXiv:2606.10445v1 Announce Type: cross Abstract: Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pr…

arXiv cs.CL TIER_1 English(EN) · Ruixuan Huang, Jinyuan Shi, Hantao Huang, Yifan Huang, Ziyi Guan, Hao Zeng, Ian En-Hsu Yen, Minghui Yu · 2026-06-10 04:00

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

arXiv:2606.10722v1 Announce Type: new Abstract: We study dense-to-sparse continual training as a way to construct channel-sparse large language models from dense checkpoints. Starting from a Qwen2.5-8B dense backbone, we continue training at 32K context and introduce a predictor-…

arXiv cs.CL TIER_1 English(EN) · Keer Lu, Liwei Chen, Guoqing Jiang, Zhiheng Qin, Yunhuai Liu, Wentao Zhang · 2026-06-10 04:00

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

arXiv:2606.10694v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly expected to interact with users over long time horizons. However, due to their finite context window, LLMs cannot retain all past interactions, making long-term memory management essenti…

arXiv cs.CL TIER_1 English(EN) · Pratibha Revankar, Kargi Chauhan, Jihye Kim, Sadiba Nusrat Nur, Vincent Siu, Chenguang Wang · 2026-06-10 04:00

MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

arXiv:2606.10304v1 Announce Type: new Abstract: When LLM agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade output-side detection but the underlying computation does not. Across nine encoding…

arXiv cs.AI TIER_1 English(EN) · Pietro Cagnasso, Eugene Belilovsky, Edouard Oyallon · 2026-06-10 04:00

Unifying Local Communications and Local Updates for LLM Pretraining

arXiv:2606.11081v1 Announce Type: cross Abstract: Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but st…

arXiv cs.AI TIER_1 English(EN) · Vanessa Schmidt, Huy Hoang Nguyen, C\'edric Jung, Shirin Salehi, Anke Schmeink · 2026-06-10 04:00

Unifying Data, Memory, and Compute Efficiency in LLM training: A Survey

arXiv:2606.10706v1 Announce Type: cross Abstract: Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits.…

arXiv cs.AI TIER_1 English(EN) · Huizhen Shu, Xuying Li, Piao Xue · 2026-06-10 04:00

Stop Early, Spend Less: Hidden-State Probes as a Practical Recipe for Streaming Moderation of LLM Outputs

arXiv:2606.10487v1 Announce Type: cross Abstract: Deploying large language models in user-facing systems requires efficient output safety filtering. Existing approaches typically rely on a separate moderation model applied after generation, which doubles inference cost and only d…

arXiv cs.AI TIER_1 English(EN) · Hainiu Xu, Italo Luis da Silva, Jiangnan Ye, Yuhao Wang, Wei Liu, Linyi Yang, Jonathan Richard Schwarz, Nicola Paoletti, Yulan He, Hanqi Yan · 2026-06-10 04:00

PreAct-Bench: Benchmarking Predictive Monitoring in LLMs

arXiv:2606.09890v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents capable of executing multi-step action trajectories toward a given objective. While existing safety research has focused on detecting unethical behavior f…

arXiv cs.AI TIER_1 English(EN) · Xinrui Chen, Jianhao Zhang, Ou Wu, Di Gao · 2026-06-10 04:00

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

arXiv:2606.09866v1 Announce Type: cross Abstract: Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our…

arXiv cs.AI TIER_1 English(EN) · Yunhan Jiang, Wenbin Duan, Shasha Guo, Liang Pang, Xiaoqian Sun, Huawei Shen · 2026-06-10 04:00

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

arXiv:2606.10532v1 Announce Type: new Abstract: Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within…

arXiv cs.LG TIER_1 English(EN) · Guoxia Wang, Shuai Li, Congliang Chen, Jinle Zeng, Jiabin Yang, Dianhai Yu, Yanjun Ma, Li Shen · 2026-06-10 04:00

AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping

arXiv:2502.11034v3 Announce Type: replace Abstract: Loss spikes remain a persistent obstacle in large-scale language model pretraining. While previous research has attempted to identify the root cause of loss spikes by investigating individual factors, we observe that, in practic…

arXiv cs.LG TIER_1 English(EN) · Qingbo Wu, Ke Li, Wenzhu Wang, Jie Yu, Ruian Zhang, Lili Liu · 2026-06-10 04:00

Operator Fusion for LLM Inference on the Tensix Architecture

arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in sel…

arXiv cs.CL TIER_1 English(EN) · Yi Yang · 2026-06-10 01:41

GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

Graph analysis underlies many applications whose answers cannot be looked up in a single record or retrieved along a path: laundering rings, drug repurposing, user preference, and scientific theme are all inferred from a node together with its neighbourhood. We introduce GraphInf…

arXiv cs.LG TIER_1 English(EN) · Edouard Oyallon · 2026-06-09 16:40

Unifying Local Communications and Local Updates for LLM Pretraining

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but still rely on synchronous All-Reduce operations that…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 16:40

Unifying Local Communications and Local Updates for LLM Pretraining

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but still rely on synchronous All-Reduce operations that…

arXiv cs.CL TIER_1 English(EN) · Lena A. Jäger · 2026-06-09 13:39

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections and, more broadly, leaves models without a principl…

arXiv cs.AI TIER_1 English(EN) · Sophia Ananiadou · 2026-06-09 13:31

Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs

LLM deception is often evaluated through direct markers such as fabricated claims, explicit lies, or strategic concealment. However, many real-world misleading communications do not depend on false statements, rather, they arise from selective treatment of true material facts: om…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 11:32

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

We study dense-to-sparse continual training as a way to construct channel-sparse large language models from dense checkpoints. Starting from a Qwen2.5-8B dense backbone, we continue training at 32K context and introduce a predictor-gated sparse SwiGLU FFN in the 32K stage. For ea…

arXiv cs.CL TIER_1 English(EN) · Minghui Yu · 2026-06-09 11:32

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

We study dense-to-sparse continual training as a way to construct channel-sparse large language models from dense checkpoints. Starting from a Qwen2.5-8B dense backbone, we continue training at 32K context and introduce a predictor-gated sparse SwiGLU FFN in the 32K stage. For ea…

arXiv cs.AI TIER_1 English(EN) · Anke Schmeink · 2026-06-09 11:09

Unifying Data, Memory, and Compute Efficiency in LLM training: A Survey

Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits. This survey adopts a constraint-centric perspecti…

arXiv cs.CL TIER_1 English(EN) · Wentao Zhang · 2026-06-09 10:53

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

Large Language Models (LLMs) are increasingly expected to interact with users over long time horizons. However, due to their finite context window, LLMs cannot retain all past interactions, making long-term memory management essential for storing, updating, and retrieving histori…

arXiv cs.AI TIER_1 English(EN) · Huawei Shen · 2026-06-09 08:03

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within a single model context. This design imposes a f…

arXiv cs.CL TIER_1 English(EN) · Samyam Rajbhandari · 2026-06-09 05:48

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pruning. Meanwhile, existing relaxed sparsity format…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 05:48

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pruning. Meanwhile, existing relaxed sparsity format…

arXiv cs.AI TIER_1 English(EN) · Zirui Wang, Yusen Hou, Shaofeng Liang, Bowen Tian, Yanlin Zhang, Wenshuo Chen, Yutao Yue · 2026-06-09 04:00

ABLE: Representing and Mapping LLMs via Attribution-Based Large-model Embedding

arXiv:2606.07524v1 Announce Type: cross Abstract: The explosive growth of large language models (LLMs) has created a heterogeneous and poorly documented ecosystem, making systematic model comparison increasingly important for provenance auditing, security analysis, and model sele…

arXiv cs.LG TIER_1 English(EN) · Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li · 2026-06-09 04:00

Efficient Scaling of LLM Training with Flexible Context Parallelism

arXiv:2602.21788v2 Announce Type: replace-cross Abstract: Scaling long-context capabilities is crucial for Large Language Models (LLMs). However, real-world data contain a large number of sequences with heterogeneous lengths. Existing training libraries for LLMs rely on static pa…

arXiv cs.LG TIER_1 English(EN) · Tuc Nguyen, Thai Le · 2026-06-09 04:00

ATLAS: Verifier-Guided Adaptive Latent Activation Steering for Efficient LLM Reasoning

arXiv:2601.03093v2 Announce Type: replace Abstract: Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without updating model parameters…

arXiv cs.LG TIER_1 English(EN) · Jiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee, Liefeng Bo, Tianyu Pang · 2026-06-09 04:00

Rethinking the Divergence Regularization in LLM RL

arXiv:2606.09821v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control e…

arXiv cs.LG TIER_1 English(EN) · Haozhe Hu, Hao Wu, Anhao Zhao, Longwei Ding, Peiran Yin, Yunpu Ma, Xiaoyu Shen · 2026-06-09 04:00

Beyond FLOPs: Benchmarking Real Inference Acceleration of LLM Pruning under a GEMM-Centric Taxonomy

arXiv:2606.09080v1 Announce Type: new Abstract: Pruning has emerged as a dominant paradigm for accelerating large language model (LLM) inference, spanning a broad spectrum of methods that remove computation across tokens, layers, heads, dimensions, and attention patterns. Despite…

arXiv cs.LG TIER_1 English(EN) · Tuc Nguyen, Thai Le · 2026-06-09 04:00

Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior

arXiv:2606.08454v1 Announce Type: new Abstract: Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering…

arXiv cs.LG TIER_1 English(EN) · Zifan Lyu, Chahine Nejma, Tobias Wegel, Fanny Yang, Florian E. Dorner · 2026-06-09 04:00

Cutting LLM Evaluation Costs with SySRs: A Bandit Algorithm that Provably Exploits Model Similarity

arXiv:2606.07726v1 Announce Type: new Abstract: Large Language Models are typically benchmarked by evaluating every model on every test query. For practitioners seeking the best model to deploy, this is often wasteful: if a model clearly performs worse than others, there is no ne…

arXiv cs.AI TIER_1 English(EN) · Vincent-Daniel Yun, Junhyuk Jo, Sai Praneeth Karimireddy, Sunwoo Lee · 2026-06-09 04:00

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

arXiv:2605.15491v2 Announce Type: replace-cross Abstract: Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, le…

arXiv cs.AI TIER_1 English(EN) · Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu · 2026-06-09 04:00

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

arXiv:2603.05500v2 Announce Type: replace-cross Abstract: Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training (POET), a spectrum-prese…

arXiv cs.AI TIER_1 English(EN) · Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Shenhao Wang, Haris Koutsopoulos, Hai Wang, Cathy Wu, Jinhua Zhao · 2026-06-09 04:00

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

arXiv:2510.18428v4 Announce Type: replace Abstract: Optimization modeling underlies critical decision-making across industries, yet remains difficult to automate: natural-language problem descriptions must be translated into precise mathematical formulations and executable solver…

arXiv cs.AI TIER_1 English(EN) · Shijie Zhang, Zheng Xiao, Shiyu Liu, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo, Shaoyu Liu, Wangxiao Zhao, Guanjun Jiang · 2026-06-09 04:00

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

arXiv:2509.25004v2 Announce Type: replace Abstract: Online reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning abilities of large language models, but most methods still optimize reasoning trajectories over the static…

arXiv cs.AI TIER_1 English(EN) · Yuhan Ma, Yong Li, Stefan Schmid · 2026-06-09 04:00

FuseFSS: Efficient Secure LLM Inference with Function Secret Sharing

arXiv:2606.09551v1 Announce Type: cross Abstract: Two-server secure inference allows a client to query a hosted large language model (LLM) without revealing prompts or embeddings. Recent GPU systems based on function secret sharing (FSS) make linear layers efficient, but fixed-po…

arXiv cs.AI TIER_1 English(EN) · Hong Guo, Nianhui Guo, Weixing Wang, Jona Otholt, Christoph Meinel, Haojin Yang · 2026-06-09 04:00

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

arXiv:2606.08761v1 Announce Type: cross Abstract: W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision fallbacks. We present the first systematic study of how intra-SM comp…

arXiv cs.AI TIER_1 English(EN) · Zheng Wang, Eric Liu, Linan Jiang, Zhongkai Yu, Zaifeng Pan, Yue Guan, Yuke Wang, Yufei Ding · 2026-06-09 04:00

FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training

arXiv:2606.08476v1 Announce Type: cross Abstract: Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, existing CP methods suffer from workload imbalance, inefficient kernels, …

arXiv cs.AI TIER_1 English(EN) · Haochang Hao, Dehai Min, Zhifang Zhang, Yunbei Zhang, Miao Xu, Yingqiang Ge, Lu Cheng · 2026-06-09 04:00

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

arXiv:2606.07943v1 Announce Type: cross Abstract: Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload dera…

arXiv cs.AI TIER_1 English(EN) · Zhanchao Xu, Haoyang Li, Qingfa Xiao, Fei Teng, Chen Jason Zhang, Lei Chen, Qing Li · 2026-06-09 04:00

From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs

arXiv:2606.09508v1 Announce Type: new Abstract: Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixed sparsity patterns or uniform budgets across all attention heads, overlooking the substantial variation in attention beha…

arXiv cs.AI TIER_1 English(EN) · Shibing Mo, Jing Liu, Jianchu Xu, Ruilin Wu · 2026-06-09 04:00

Order Matters: Unveiling the Hidden Impact of Macro Placement Sequences via Proxy-Guided LLM Evolution

arXiv:2606.08904v1 Announce Type: new Abstract: Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learnin…

arXiv cs.AI TIER_1 English(EN) · Shumeng Yang, Yisu Liu, Jiayi Zheng, Zhaohui Yang, Linjing Li · 2026-06-09 04:00

PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR

arXiv:2606.08543v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths…

arXiv cs.AI TIER_1 English(EN) · Siyu Lou, Yao Yan, Yuntian Chen, Quanshi Zhang · 2026-06-09 04:00

Cross-LLM Consistency in Inference: Evidence from Shared Interactions

arXiv:2606.08129v1 Announce Type: new Abstract: Large language models (LLMs) differ in architecture, training data, and optimization procedures, yet they may still develop similar internal inference patterns. In this paper, we examine this hypothesis using interaction-based expla…

arXiv cs.CL TIER_1 English(EN) · Chenguang Wang · 2026-06-09 01:45

MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

When LLM agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade output-side detection but the underlying computation does not. Across nine encoding families and eight models from five architectur…

arXiv cs.LG TIER_1 English(EN) · Tianyu Pang · 2026-06-08 17:58

Rethinking the Divergence Regularization in LLM RL

Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream met…

arXiv cs.AI TIER_1 English(EN) · Stefan Schmid · 2026-06-08 14:30

FuseFSS: Efficient Secure LLM Inference with Function Secret Sharing

Two-server secure inference allows a client to query a hosted large language model (LLM) without revealing prompts or embeddings. Recent GPU systems based on function secret sharing (FSS) make linear layers efficient, but fixed-point nonlinearities and helper operations remain a …

arXiv cs.AI TIER_1 English(EN) · Qing Li · 2026-06-08 14:02

From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs

Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixed sparsity patterns or uniform budgets across all attention heads, overlooking the substantial variation in attention behavior among heads and contexts. We observe two di…

arXiv cs.CL TIER_1 English(EN) · Xiaoyu Shen · 2026-06-08 06:26

Beyond FLOPs: Benchmarking Real Inference Acceleration of LLM Pruning under a GEMM-Centric Taxonomy

Pruning has emerged as a dominant paradigm for accelerating large language model (LLM) inference, spanning a broad spectrum of methods that remove computation across tokens, layers, heads, dimensions, and attention patterns. Despite sharing the same objective, these pruning appro…

arXiv cs.LG TIER_1 English(EN) · Ziyue Li, Yang Li, Tianyi Zhou · 2026-06-08 04:00

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

arXiv:2606.06574v1 Announce Type: new Abstract: Large language models (LLMs) perform inference by following a fixed depth and order, non-recurrent execution of all layers. We reveal the wide existence of training-free, flexible, dynamic program-of-layers (PoLar), where pretrained…

arXiv cs.AI TIER_1 English(EN) · Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu · 2026-06-08 04:00

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

arXiv:2601.12359v1 Announce Type: cross Abstract: Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induc…

arXiv cs.CL TIER_1 English(EN) · Yuhang Zhou, Yixin Cao, Guangnan Ye · 2026-06-08 04:00

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

arXiv:2606.07190v1 Announce Type: new Abstract: Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 01:10

Order Matters: Unveiling the Hidden Impact of Macro Placement Sequences via Proxy-Guided LLM Evolution

Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learning for spatial coordinate determination, the temp…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 00:00

Rethinking the Divergence Regularization in LLM RL

DRPO improves LLM reinforcement learning stability by replacing hard masks with smooth regularization that provides continuous gradient corrections beyond trust-region boundaries.

arXiv cs.AI TIER_1 English(EN) · Linjing Li · 2026-06-07 09:51

PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR

Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths. While global entropy regularization can encour…

arXiv cs.AI TIER_1 English(EN) · Yufei Ding · 2026-06-07 06:45

FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training

Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, existing CP methods suffer from workload imbalance, inefficient kernels, and redundant communication due to static sequence…

arXiv cs.CL TIER_1 English(EN) · Thai Le · 2026-06-07 05:01

Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior

Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering direction in the original activation space, typ…

arXiv cs.AI TIER_1 English(EN) · Thibaud Ardoin, Jonas Sch\"afer, Gerhard Wunder · 2026-06-06 04:00

LLM Self-Recognition: Steering and Retrieving Activation Signatures

arXiv:2606.06315v1 Announce Type: new Abstract: Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in …

arXiv cs.AI TIER_1 English(EN) · Giuseppe Canonaco, Alberto Pozanco, Daniel Borrajo · 2026-06-06 04:00

Semantic Partial Grounding via LLMs

arXiv:2602.22067v2 Announce Type: replace Abstract: Grounding is a critical step in classical planning, yet it often becomes a computational bottleneck due to the exponential growth in grounded actions and atoms as task size increases. Recent advances in partial grounding have ad…

arXiv cs.AI TIER_1 English(EN) · Rahul Suresh Babu, Laxmipriya Ganesh Iyer · 2026-06-06 04:00

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

arXiv:2606.06284v1 Announce Type: new Abstract: Large language model agents increasingly rely on external tools, but larger tool menus can reduce reliability and efficiency by increasing wrong-tool calls, premature actions, and token cost. Existing tool-selection methods often op…

arXiv cs.AI TIER_1 English(EN) · Nicol\'as Astorga, Nabeel Seedat, Mihaela van der Schaar · 2026-06-06 04:00

Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces

arXiv:2606.05464v1 Announce Type: new Abstract: Verifiable reward training has improved mathematical and coding reasoning, but these domains capture only part of step-by-step decision making. Many real-world tasks require finding a high-value feasible plan among many valid altern…

arXiv cs.AI TIER_1 English(EN) · Manya Pandey, Dhruv Kumar, Murari Mandal, Saurabh Deshpande · 2026-06-06 04:00

GITCO: Gated Inference-Time Context Optimization in TSFMs

arXiv:2606.05332v1 Announce Type: new Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy a…

arXiv cs.CL TIER_1 English(EN) · Lu Cheng · 2026-06-06 02:10

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting fail…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-06 00:00

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while avoiding detection by LLM scanners that are overly sensitive to privileged tool operations.

arXiv cs.CL TIER_1 English(EN) · Guangnan Ye · 2026-06-05 11:56

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect we ultimately care about: whether a prefix incre…

arXiv cs.CL TIER_1 English(EN) · Yongwei Zhou, Juncheng Diao, Junlin Shang, Peiguang Li, Rongxiang Weng · 2026-06-05 04:00

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

arXiv:2606.05610v1 Announce Type: new Abstract: The efficacy of continued pre-training for Large Language Models (LLMs) hinges upon hyperparameter configurations, such as learning rate and batch size. However, current practices often rely on heuristics or grid searches, leading t…

arXiv cs.CL TIER_1 English(EN) · Ruoxi Sun, Quantong Qiu, Juntao Li, Zecheng Tang, Yihang Lou, Min Zhang · 2026-06-05 04:00

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

arXiv:2606.05843v1 Announce Type: new Abstract: While Multimodal Large Language Models (MLLMs) demonstrate remarkable proficiency on complex vision-language tasks, the mechanisms by which they extract query-relevant visual features from complex, noisy contexts remain opaque. In t…

arXiv cs.LG TIER_1 English(EN) · Jingyao Wu, Ashley Wang, Keane Ong, Paul Pu Liang, Rosalind Picard · 2026-06-05 04:00

SHALA-LLM: Smartly Handling Ambiguous Labels in Aligning LLMs

arXiv:2606.05376v1 Announce Type: new Abstract: Many human-centered tasks, including natural language inference (NLI) and emotion recognition (ER), have multiple plausible interpretations, leading to label ambiguity and challenging disagreements across human annotators. As LLMs a…

arXiv cs.CL TIER_1 English(EN) · Mary Llewellyn, Isobel Thornton, James Bishop, Annie Gray · 2026-06-05 04:00

Correcting Prompt Dependence in LLM Benchmarks: A Bayesian Hierarchical Model with Embedding-Space Clustering

arXiv:2510.05709v2 Announce Type: replace-cross Abstract: LLM benchmarking metrics often misstate performance and uncertainty as they rely on two assumptions that frequently do not hold in practice: (i) a sufficient number of evaluations are available for classical inference, and…

arXiv cs.CL TIER_1 English(EN) · Jiahao Zeng, Ming Tang, Ningning Ding · 2026-06-05 04:00

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

arXiv:2606.06178v1 Announce Type: cross Abstract: Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most su…

arXiv cs.LG TIER_1 English(EN) · Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao, Alex Schwing, Ruoyu Sun · 2026-06-05 04:00

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

arXiv:2606.06470v1 Announce Type: new Abstract: We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices …

arXiv cs.AI TIER_1 English(EN) · Ruoyu Sun · 2026-06-04 17:55

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices via low-degree polynomial preconditioning. After…

arXiv cs.AI TIER_1 English(EN) · Gerhard Wunder · 2026-06-04 15:54

LLM Self-Recognition: Steering and Retrieving Activation Signatures

Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in low-entropy scenarios, and that it can be amplif…

arXiv cs.AI TIER_1 English(EN) · Laxmipriya Ganesh Iyer · 2026-06-04 15:24

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

Large language model agents increasingly rely on external tools, but larger tool menus can reduce reliability and efficiency by increasing wrong-tool calls, premature actions, and token cost. Existing tool-selection methods often optimize semantic relevance, exposing tools whose …

arXiv cs.AI TIER_1 English(EN) · Ningning Ding · 2026-06-04 13:53

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot per…

arXiv cs.CL TIER_1 English(EN) · Min Zhang · 2026-06-04 08:18

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

While Multimodal Large Language Models (MLLMs) demonstrate remarkable proficiency on complex vision-language tasks, the mechanisms by which they extract query-relevant visual features from complex, noisy contexts remain opaque. In this paper, we present an in-depth interpretabili…

arXiv cs.CL TIER_1 English(EN) · Siheng Xiong, Oguzhan Gungordu, James C. Kerce, Faramarz Fekri · 2026-06-04 04:00

Adaptive Information Control for Search-Augmented LLM Reasoning

arXiv:2602.01672v2 Announce Type: replace Abstract: Search-augmented reasoning agents interleave multi-step reasoning with external retrieval, but uncontrolled retrieval can introduce redundant evidence, saturate the context, and destabilize reinforcement learning (RL). Existing …

arXiv cs.AI TIER_1 English(EN) · Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci · 2026-06-04 04:00

Ekka: Automated Diagnosis of Silent Errors in LLM Inference

arXiv:2606.04594v1 Announce Type: cross Abstract: LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit er…

arXiv cs.CL TIER_1 English(EN) · Xin Zhang, Yang Cao, Baoxing Wu, Kai Song, Siying Li · 2026-06-04 04:00

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step reaso…

arXiv cs.AI TIER_1 English(EN) · Liulu He, XuanAng Liu, Juntao Liu, Taolue Feng, Ting Lu, Chunsheng Gan, Zhiyv Peng, Yuan Du, Huanrui Yang, Yijiang Liu, Li Du · 2026-06-04 04:00

LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

arXiv:2606.04050v1 Announce Type: cross Abstract: Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To br…

arXiv cs.CL TIER_1 English(EN) · Changcheng Li, Jiancan Wu, Hengheng Zhang, Zhengsu Chen, Guo An, Junxiang Qiu, Xiang Wang, Qi Tian · 2026-06-04 04:00

Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation

arXiv:2603.05881v2 Announce Type: replace Abstract: Reliable deployment of large language models (LLMs) requires accurate uncertainty estimation. Existing methods are predominantly answer-first, producing confidence only after generating an answer, which measure the correctness o…

arXiv cs.CL TIER_1 English(EN) · Qinghe Ma, Zhen Zhao, Yiming Wu, Jian Zhang, Lei Bai, Yinghuan Shi · 2026-06-04 04:00

Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

arXiv:2605.19852v2 Announce Type: replace Abstract: Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invoca…

arXiv cs.LG TIER_1 English(EN) · Jack Sanderson, Yihan Wang, Xiaoqian Lu, Gautam Kamath, Yiwei Lu · 2026-06-04 04:00

Sequential Data Poisoning in LLM Post-Training

arXiv:2606.04929v1 Announce Type: new Abstract: LLM post-training proceeds through multiple stages, e.g., supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), where each stage draws data from different…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Pretrained language models can execute layers dynamically through flexible program-of-layers strategies that improve accuracy while reducing computational overhead compared to standard fixed-depth inference.

arXiv cs.LG TIER_1 English(EN) · Yiwei Lu · 2026-06-03 14:22

Sequential Data Poisoning in LLM Post-Training

LLM post-training proceeds through multiple stages, e.g., supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), where each stage draws data from different, potentially untrusted sources. Existing litera…

arXiv cs.AI TIER_1 English(EN) · Baris Kasikci · 2026-06-03 08:32

Ekka: Automated Diagnosis of Silent Errors in LLM Inference

LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notorious…

arXiv cs.AI TIER_1 English(EN) · Patrick Emami, Nan Qiang, Peter Graf · 2026-06-03 04:00

A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners

arXiv:2606.03685v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) improves end-to-end classical planning in large language models (LLMs), but do these models also learn to represent and reason about the planning problems they are solving? Due to the relative complexi…

arXiv cs.AI TIER_1 English(EN) · Mubarak Adetunji Ojewale · 2026-06-03 04:00

NetKV: Network-Aware Decode Instance Selection for Disaggregated LLM Inference

arXiv:2606.03910v1 Announce Type: cross Abstract: Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and p…

arXiv cs.AI TIER_1 English(EN) · Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu · 2026-06-03 04:00

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

arXiv:2505.24037v3 Announce Type: replace Abstract: Sparse large language models (LLMs) offer an attractive direction toward efficient deployment, but adapting them to downstream tasks remains challenging. The central difficulty is to enable effective task adaptation without sacr…

arXiv cs.AI TIER_1 English(EN) · Shani Goren, Ido Galil, Ran El-Yaniv · 2026-06-03 04:00

When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation

arXiv:2602.11908v3 Announce Type: replace Abstract: LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncertainty estimation mechanisms that abst…

arXiv cs.AI TIER_1 English(EN) · Tianxi Gao, Yufan Cai, Yusi Yuan, Jin Song Dong · 2026-06-03 04:00

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

arXiv:2603.05290v2 Announce Type: replace Abstract: Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capa…

arXiv cs.AI TIER_1 English(EN) · Hamid Dadkhahi, Firas Trabelsi, Parker Riley, Juraj Juraska, Mehdi Mirzazadeh · 2026-06-03 04:00

Distribution-Calibrated Inference Time Compute for Thinking LLM-as-a-Judge

arXiv:2512.03019v2 Announce Type: replace-cross Abstract: Thinking Large Language Models (LLMs) used as judges for pairwise preferences remain noisy at the single-sample level, and common aggregation rules (majority vote, soft self-consistency, or instruction-based self-aggregati…

arXiv cs.AI TIER_1 English(EN) · Mehmet Hamza Erol, Xiangpeng Hao, Federico Bianchi, Ciro Greco, Jacopo Tagliabue, James Zou · 2026-06-03 04:00

Test-Time Optimization of Physical Query Plans with LLMs

arXiv:2602.10387v2 Announce Type: replace-cross Abstract: Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving these requires substantial engineer…

arXiv cs.LG TIER_1 English(EN) · Yunsheng Yuan, Shaowei Li, Kai Wang, Zhongyuan Sun, Zheng Zhang, Kai Han, Jun Luo, Feng Li · 2026-06-03 04:00

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural p…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-03 00:00

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

SparDA is a decoupled sparse attention architecture that improves long-context LLM inference by reducing KV cache bottlenecks and attention complexity through aForecast projection for lookahead selection.

arXiv cs.AI TIER_1 English(EN) · Mubarak Adetunji Ojewale · 2026-06-02 17:06

NetKV: Network-Aware Decode Instance Selection for Disaggregated LLM Inference

Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and prefix-cache locality alone, ignoring the topologic…

arXiv cs.AI TIER_1 English(EN) · Peter Graf · 2026-06-02 14:09

A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners

Supervised fine-tuning (SFT) improves end-to-end classical planning in large language models (LLMs), but do these models also learn to represent and reason about the planning problems they are solving? Due to the relative complexity of classical planning problems and the challeng…

arXiv cs.AI TIER_1 English(EN) · Weifang Zhang, Yuzhou Nie, Bowen Pang, Guangrui Ma, Shining Wu · 2026-06-02 04:00

Threshold-Based Exclusive Batching for LLM Inference

arXiv:2606.00516v1 Announce Type: new Abstract: Mixed batching (MB)--interleaving prefill and decode in a single batch--has become the standard scheduling strategy for large language model (LLM) inference due to its efficiency in maximizing compute and memory utilization. However…

arXiv cs.AI TIER_1 English(EN) · Jiangyu Chen, Banyi · 2026-06-02 04:00

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

arXiv:2606.01730v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as heuristic advisors for black-box optimization, yet their suggestions and self-reported confidence are not necessarily calibrated to downstream objective values. This issue become…

arXiv cs.AI TIER_1 English(EN) · Thi-Nhung Nguyen, Linhao Luo, Rollin Omari, Junae Kim, Thuy-Trang Vu, Dinh Phung · 2026-06-02 04:00

TriAlign: Towards Universal Truth Consistency in Personalized LLM Alignment

arXiv:2606.01755v1 Announce Type: new Abstract: Personalized large language models adapt responses to users' preferences and social attributes, but can introduce substantial universal truth inconsistencies across social groups, where some groups systematically receive less accura…

arXiv cs.AI TIER_1 English(EN) · Mingyi Wang, Zhuoer Shen, Yuheng Bu, Shaofeng Zou · 2026-06-02 04:00

Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

arXiv:2606.00392v1 Announce Type: cross Abstract: AI-text detectors are vulnerable to paraphrasing and detector-guided paraphrasing attacks, but existing detector-evasion methods often lack precise control over semantic preservation. In particular, optimizing directly for detecto…

arXiv cs.AI TIER_1 English(EN) · Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal, Maurice van Keulen, Elena Mocanu, Mykola Pechenizkiy, Decebal Constantin Mocanu, Torsten Hoefler · 2026-06-02 04:00

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

arXiv:2606.00888v1 Announce Type: cross Abstract: Dynamic Sparse Training (DST) offers a promising paradigm for improving the training and inference efficiency of deep neural networks; however, we find that in large language model training, DST can suffer from optimization instab…

arXiv cs.AI TIER_1 English(EN) · Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan, Le Xu, Liguang Xie · 2026-06-02 04:00

Lodestar: An Online-Learning LLM Inference Router

arXiv:2606.00946v1 Announce Type: cross Abstract: Efficiently serving large language model (LLM) inference tasks is crucial both for user-perceived latency such as time-to-first-token (TTFT) and for GPU utilization. However, LLM request routing, that is, assigning each inference …

arXiv cs.AI TIER_1 English(EN) · Wentao Mo, Yang Liu · 2026-06-02 04:00

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

arXiv:2606.01215v1 Announce Type: cross Abstract: Current 3D spatial reasoning methods face a fundamental trade-off: neuro-symbolic 3D (NS3D) concept learners achieve interpretable reasoning through compositional programs but are constrained to closed-set concept vocabularies and…

arXiv cs.AI TIER_1 English(EN) · Yixiu Mao, Yun Qu, Qi Wang, Heming Zou, Xiangyang Ji · 2026-06-02 04:00

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

arXiv:2606.01281v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, its effectiveness is substantially hindered by the prevale…

arXiv cs.AI TIER_1 English(EN) · Denica Kjorvezir, Marko Djukanovi\'c, Ana Gjorgjevikj, Gjorgjina Cenikj, Tome Eftimov · 2026-06-02 04:00

Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs

arXiv:2606.01400v1 Announce Type: cross Abstract: Evaluating large language models (LLMs) across comprehensive benchmarks is expensive and time-consuming. We propose a graph-based prompt selection framework that models each benchmark as a similarity graph -- nodes are prompts con…

arXiv cs.AI TIER_1 English(EN) · Liu Qing, Ou Wu, Yi Du · 2026-06-02 04:00

AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

arXiv:2606.01635v1 Announce Type: cross Abstract: Token selection is pivotal for effective LLM post-training. However, existing methods mostly rely on local heuristics and rarely formulate token selection as a principled valuation of individual response tokens. We introduce $\tex…

arXiv cs.AI TIER_1 English(EN) · Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic · 2026-06-02 04:00

LLM-WikiRace Benchmark: How Far Can LLMs Plan over Real-World Knowledge Graphs?

arXiv:2602.16902v4 Announce Type: replace Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a targe…

arXiv cs.AI TIER_1 English(EN) · Fangzhou Wu, Sandeep Silwal, Qiuyi Zhang · 2026-06-02 04:00

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

arXiv:2605.17110v2 Announce Type: replace Abstract: Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, ofte…

arXiv cs.AI TIER_1 English(EN) · Maciej Chrab\k{a}szcz, Filip Szatkowski, Bartosz W\'ojcik, Jan Dubi\'nski, Tomasz Trzci\'nski, Sebastian Cygert · 2026-06-02 04:00

Efficient LLM Moderation with Multi-Layer Latent Prototypes

arXiv:2502.16174v4 Announce Type: replace-cross Abstract: Although modern LLMs are aligned with human values during post-training, robust moderation remains essential to prevent harmful outputs at deployment time. Existing approaches suffer from performance-efficiency trade-offs …

arXiv cs.LG TIER_1 English(EN) · Andrei Panferov, Erik Schultheis, Soroush Tabesh, Dan Alistarh · 2026-06-02 04:00

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

arXiv:2601.22813v2 Announce Type: replace Abstract: The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training me…

arXiv cs.LG TIER_1 English(EN) · Tuan Nguyen, Long Tran-Thanh · 2026-06-02 04:00

Safety Game: Inference-Time Alignment of Black-Box LLMs via Constrained Optimization

arXiv:2510.09330v3 Announce Type: replace Abstract: Ensuring that large language models (LLMs) comply with safety requirements is a central challenge in AI deployment. Existing alignment approaches primarily operate during training, such as through fine-tuning or reinforcement le…

arXiv cs.LG TIER_1 English(EN) · Kiran Nayudu, Aswini Nutakki, Sai Vinay Naidu, Ashwin Shanmugasundaram · 2026-06-02 04:00

CRMA: A Spectrally-Bounded Backbone for Modular Continual Fine-Tuning of LLMs

arXiv:2606.00382v1 Announce Type: new Abstract: Sequential fine-tuning of large language models forces a choice: let the shared substrate keep learning and accept catastrophic forgetting, or freeze it after task one and foreclose cross-task refinement. Per-task adapter methods (L…

arXiv cs.CL TIER_1 English(EN) · Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su, Yujia Zhou, Min Zhang, Yiqun Liu, Qinyao Ai · 2026-06-02 04:00

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

arXiv:2606.01629v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly used for long-form generation, reliably evaluating long-form outputs has become a critical challenge. LLM-as-a-judge offers a scalable alternative to human evaluation, yet its reliabi…

arXiv cs.CL TIER_1 English(EN) · Hanno Hiss, Jasper Dekoninck, Martin Vechev · 2026-06-02 04:00

Learning from Saturated Data: Signals Beyond Correctness for LLM Training

arXiv:2606.01436v1 Announce Type: new Abstract: The growing capabilities of large language models (LLMs) have led to the saturation of many benchmarks and training datasets used to improve them. Motivated by this, we investigate whether questions solved with perfect empirical acc…

arXiv cs.CL TIER_1 English(EN) · Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta · 2026-06-02 04:00

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

arXiv:2606.01338v1 Announce Type: new Abstract: Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence syste…

arXiv cs.AI TIER_1 English(EN) · Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti · 2026-06-02 04:00

Are LLMs Ready for Neural-integrated Mechanistic Modeling? A Benchmark and Agentic Framework

arXiv:2602.18008v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have shown promise in constructing mechanistic models from data. However, existing evaluations largely focus on simplified settings and fail to capture the complexity of real-world scientific m…

arXiv cs.AI TIER_1 English(EN) · Yi Li, Hongze Shen, Lexiang Tang, Xin Li, Xinpeng Ding, Yinsong Liu, Deqiang Jiang, Xing Sun, Xiaomeng Li · 2026-06-02 04:00

DenseMLLM: Standard Multimodal LLMs for Dense Prediction

arXiv:2602.14134v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in high-level visual understanding. However, extending these models to fine-grained dense prediction tasks, such as semantic segmentation …

arXiv cs.AI TIER_1 English(EN) · Bogdan Zagribelnyy, Ivan Ilin, Maksim Kuznetsov, Nikita Bondarev, Mathieu Reymond, Roman Schutski, Thomas MacDougall, Rim Shayakhmetov, Zulfat Miftakhutdinov, Mikolaj Mizera, Vladimir Aladinskiy, Alex Aliper, Alex Zhavoronkov · 2026-06-02 04:00

When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs

arXiv:2602.03554v2 Announce Type: replace-cross Abstract: Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and met…

arXiv cs.AI TIER_1 English(EN) · Yu He, Yingxi Li, Colin White, Ellen Vitercik · 2026-06-02 04:00

Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures

arXiv:2505.24069v4 Announce Type: replace-cross Abstract: Large language models (LLMs) are deployed on increasingly complex tasks that require multi-step decision-making. Understanding their algorithmic reasoning abilities is therefore crucial. However, we lack a diagnostic bench…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-01 05:50

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

Large language models (LLMs) are increasingly used as heuristic advisors for black-box optimization, yet their suggestions and self-reported confidence are not necessarily calibrated to downstream objective values. This issue becomes more pronounced in multi-objective Bayesian op…

arXiv cs.LG TIER_1 English(EN) · Kairun Zhang, Haoyu Li, Yanjun Zhao, Yifan Sun, Huan Zhang · 2026-06-01 04:00

Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs

arXiv:2510.00419v2 Announce Type: replace Abstract: Zeroth-order optimizers have recently emerged as an attractive approach for fine-tuning large language models (LLMs), as they avoid backpropagation and can substantially reduce memory overhead relative to standard first-order tr…

arXiv cs.AI TIER_1 English(EN) · Yuanjian Xu, Jianing Hao, Guang Zhang, Zhong Li · 2026-06-01 04:00

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

arXiv:2605.31164v1 Announce Type: cross Abstract: Training data plays a central role in large language models (LLMs) optimization, motivating extensive research on data scheduling strategies. Most existing approaches concentrate on adjusting the overall data distribution but negl…

arXiv cs.AI TIER_1 English(EN) · Mikkel Godsk J{\o}rgensen, Lars Kai Hansen · 2026-06-01 04:00

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

arXiv:2605.31183v1 Announce Type: cross Abstract: Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu…

arXiv cs.AI TIER_1 English(EN) · Azim Ospanov, Zijin Feng, Jiacheng Sun, Haoli Bai, Xin Shen, Farzan Farnia · 2026-06-01 04:00

HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

arXiv:2511.18760v2 Announce Type: replace Abstract: Informal mathematics has been central to modern large language model (LLM) reasoning, offering flexibility and efficient construction of arguments. However, purely informal reasoning is prone to logical gaps and subtle errors th…

arXiv cs.AI TIER_1 English(EN) · Yuzhe Gu, Xiyu Liang, Jiaojiao Zhao, Enmao Diao · 2026-06-01 04:00

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

arXiv:2510.07651v2 Announce Type: replace-cross Abstract: Large language models (LLMs) with extended context windows enable powerful applications but impose significant memory overhead, as caching all key-value (KV) states scales linearly with sequence length and batch size. Exis…

arXiv cs.AI TIER_1 English(EN) · Saeed Mohammadzadeh, Erfan Hamdi, Joel Shor, Emma Lejeune · 2026-06-01 04:00

FEM-Bench: A Structured Scientific Reasoning Benchmark for Evaluating Code-Generating LLMs

arXiv:2512.20732v2 Announce Type: replace-cross Abstract: As LLMs advance their reasoning capabilities about the physical world, the absence of rigorous benchmarks for evaluating their ability to generate scientifically valid physical models has become a critical gap. Computation…

arXiv cs.AI TIER_1 English(EN) · Sher Badshah, Ali Emami, Hassan Sajjad · 2026-06-01 04:00

SCOPE: Selective Conformal Optimized Pairwise LLM Judging

arXiv:2602.13110v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used as scalable judges in pairwise evaluation, but they remain prone to miscalibration and biases. We propose SCOPE (Selective Conformal Optimized Pairwise Evaluation), a fram…

arXiv cs.CL TIER_1 English(EN) · Sander Land, Daniel M. Bikel · 2026-06-01 04:00

Auditing LLM Benchmarks with Item Response Theory

arXiv:2605.30504v1 Announce Type: new Abstract: LLM benchmark labels are frozen at release and silently propagated into downstream benchmarks, errors and all. We introduce an Item Response Theory-based indicator that surfaces likely mislabels at 95% precision in the top 200 examp…

arXiv cs.CL TIER_1 English(EN) · Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang · 2026-06-01 04:00

dMoE: dLLMs with Learnable Block Experts

arXiv:2605.30876v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have recently emerged as a promising alternative to autoregressive models, offering competitive performance while naturally supporting parallel decoding. However, as dLLMs are increasingly int…

arXiv cs.CL TIER_1 English(EN) · Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li, Guang Zhang · 2026-06-01 04:00

Towards Efficient LLMs Annealing with Principled Sample Selection

arXiv:2605.31175v1 Announce Type: new Abstract: The annealing phase is a pivotal convergence stage in LLM pre-training that ultimately determines final model quality. However, effectively selecting training data during this phase remains a key challenge. Current strategies rely o…

arXiv cs.CL TIER_1 English(EN) · Zheyu Zhang, Shuo Yang, Gjergji Kasneci · 2026-06-01 04:00

Consolidating Rewarded Perturbations for LLM Post-Training

arXiv:2605.31494v1 Announce Type: new Abstract: Post-training of language models is commonly framed as a sample-score-update loop implemented by gradient descent. A recent line of work, exemplified by RandOpt, relocates this loop to weight space, sampling Gaussian perturbations a…

arXiv cs.CL TIER_1 English(EN) · Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-T\"ur, Nick Feamster · 2026-06-01 04:00

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

arXiv:2605.30526v1 Announce Type: cross Abstract: Aligned language models often exhibit a recognizable AI-like style, yet its connection to post-training and internal representations remains poorly understood. In this work, we study whether post-training introduces or amplifies A…

arXiv cs.CL TIER_1 English(EN) · Quentin Lemesle, L\'eane Jourdan, Daisy Munson, Pierre Alain, Jonathan Chevelu, Arnaud Delhay, Damien Lolive · 2026-06-01 04:00

*-PLUIE: Personalisable metric with Llm Used for Improved Evaluation

arXiv:2602.15778v2 Announce Type: replace Abstract: Evaluating the quality of automatically generated text often relies on LLM-as-a-judge (LLM-judge) methods. While effective, these approaches are computationally expensive and require post-processing. To address these limitations…

arXiv cs.CL TIER_1 English(EN) · Juneyoung Park, Yuri Hong, Seongwan Kim, Jaeho Lee · 2026-06-01 04:00

Memory-Efficient Structured Backpropagation for On-Device LLM Fine-Tuning

arXiv:2602.13069v2 Announce Type: replace-cross Abstract: On-device fine-tuning enables privacy-preserving personalization of large language models, but mobile devices impose severe memory constraints, typically 6--12GB shared across all workloads. Existing approaches force a tra…

arXiv cs.LG TIER_1 English(EN) · Yuxin Yang, Aoxiong Zeng, Xiangquan Yang · 2026-06-01 04:00

The Long-Term Effects of Data Selection in LLM Fine-Tuning

arXiv:2605.30537v1 Announce Type: new Abstract: Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different question…

arXiv cs.AI TIER_1 English(EN) · Stephane Hatgis-Kessell, Emma Brunskill · 2026-06-01 04:00

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

arXiv:2605.30719v1 Announce Type: cross Abstract: We study when large language models (LLMs) can serve as effective black-box policy optimizers for reinforcement learning (RL) tasks, i.e., when can we replace classical RL algorithms with an LLM? We explore this question by introd…

arXiv cs.LG TIER_1 English(EN) · Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, Ren\'e Vidal · 2026-06-01 04:00

Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning

arXiv:2603.09221v2 Announce Type: replace Abstract: Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require…

arXiv cs.AI TIER_1 English(EN) · Liwei Kang, Yee Whye Teh, Wee Sun Lee · 2026-06-01 04:00

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

arXiv:2605.31492v1 Announce Type: new Abstract: Large language models (LLMs) often solve reasoning problems by generating intermediate traces that explore and revise partial solutions. From a search perspective, these traces can be viewed as linearized search trees, where the mod…

arXiv cs.AI TIER_1 English(EN) · Vincent Granville · 2026-06-01 04:00

LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study

arXiv:2605.30385v1 Announce Type: cross Abstract: The purpose of this article is to provide validation to my deep neural network alternative in the context of LLMs. Very recently, there has been a significant interest by Chinese researchers in a model called RBF network, as a sub…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-30 00:00

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Large language models exhibit limited ability to correct zero-shot errors through prompting, with model performance more strongly linked to definition-specific familiarity than text-level memorization metrics.

arXiv cs.CL TIER_1 English(EN) · Gjergji Kasneci · 2026-05-29 16:16

Consolidating Rewarded Perturbations for LLM Post-Training

Post-training of language models is commonly framed as a sample-score-update loop implemented by gradient descent. A recent line of work, exemplified by RandOpt, relocates this loop to weight space, sampling Gaussian perturbations around a pretrained model and ensembling the top-…

arXiv cs.AI TIER_1 English(EN) · Wee Sun Lee · 2026-05-29 16:13

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

Large language models (LLMs) often solve reasoning problems by generating intermediate traces that explore and revise partial solutions. From a search perspective, these traces can be viewed as linearized search trees, where the model extends a partial solution, abandons it when …

arXiv cs.AI TIER_1 English(EN) · Lars Kai Hansen · 2026-05-29 11:53

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs did not seem to live up to th…

arXiv cs.CL TIER_1 English(EN) · Guang Zhang · 2026-05-29 11:42

Towards Efficient LLMs Annealing with Principled Sample Selection

The annealing phase is a pivotal convergence stage in LLM pre-training that ultimately determines final model quality. However, effectively selecting training data during this phase remains a key challenge. Current strategies rely on empirical heuristics, such as domain filtering…

arXiv cs.AI TIER_1 English(EN) · Zhong Li · 2026-05-29 11:13

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

Training data plays a central role in large language models (LLMs) optimization, motivating extensive research on data scheduling strategies. Most existing approaches concentrate on adjusting the overall data distribution but neglect the underlying interactions between samples du…

arXiv cs.LG TIER_1 English(EN) · Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu, Dingyan Shang · 2026-05-29 04:00

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arXiv:2605.28918v1 Announce Type: new Abstract: For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation. We study PPO-trained agents using MiniGrid as core ev…

arXiv cs.LG TIER_1 English(EN) · Karim Galliamov, Rochelle Choenni, Ivan Titov · 2026-05-29 04:00

Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

arXiv:2605.29075v1 Announce Type: new Abstract: LLMs encode both general capabilities and domain-specific knowledge in a single set of parameters. We ask whether this capacity can be reorganized: keeping broadly useful computation in a shared backbone, while moving specialized kn…

arXiv cs.LG TIER_1 English(EN) · Kexin Chu, Yang Zhou, Wei Zhang · 2026-05-29 04:00

MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference

arXiv:2605.30218v1 Announce Type: new Abstract: Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token ver…

arXiv cs.AI TIER_1 English(EN) · Haoyang Liu, Jie Wang, Boxuan Niu, Xiongwei Han, Yian Xu, Mingxuan Ye, Zijie Geng, Fangzhou Zhu, Tao Zhong, Mingxuan Yuan, Jianye Hao · 2026-05-29 04:00

Opt-Verifier: Unleashing the Power of LLMs for Optimization Modeling via Dual-Side Verification

arXiv:2605.29556v1 Announce Type: new Abstract: Building mathematical optimization models is critical in operations research (OR), while it requires substantial human expertise. Recent advancements have utilized large language models (LLMs) to automate this modeling process. Howe…

arXiv cs.AI TIER_1 English(EN) · Yundong Kim, Heyoung Yang · 2026-05-29 04:00

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

arXiv:2605.29656v1 Announce Type: new Abstract: Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. Existing metrics rely on final-answer accuracy or surface-level statistics, leaving the reasoning process itself…

arXiv cs.AI TIER_1 English(EN) · Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang, Jianguo Li, Peng Di, Peiyu Liu, Jianwei Yin, Wenhai Wang · 2026-05-29 04:00

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

arXiv:2605.30039v1 Announce Type: new Abstract: Large Language Models have demonstrated remarkable progress in general-purpose capabilities and can achieve strong performance in specific domains through fine-tuning on domain-specific data. However, acquiring high-quality data for…

arXiv cs.AI TIER_1 English(EN) · Fares Nabil Ibrahim, Nafis Saami Azad, Raiyan Abdul Baten · 2026-05-29 04:00

Anchorless Diversification for Parallel LLM Ideation

arXiv:2605.30150v1 Announce Type: new Abstract: LLMs are increasingly used to generate candidate-idea pools for creative tasks where broad exploration is valuable. Parallel inference can be attractive in this setting when it broadens the pool while retaining quality and cost effi…

arXiv cs.AI TIER_1 English(EN) · Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang, Xin Zhang, Wenshan Wu, Qihao Zhao, Hao Li, Yuanyuan Gao, Kim-Hui Yap, Scarlett Li · 2026-05-29 04:00

Demystifying Data Organization for Enhanced LLM Training

arXiv:2605.30334v1 Announce Type: new Abstract: Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced…

arXiv cs.AI TIER_1 English(EN) · Boqi Chen, Jos\'e Antonio Hern\'andez L\'opez, Aren A. Babikian · 2026-05-29 04:00

Projectional Decoding: Towards Semantic-Aware LLM Generation

arXiv:2605.30054v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring the semantic validity of these artifacts remains a fundamental challenge. Existing constra…

arXiv cs.AI TIER_1 English(EN) · Kajetan Schweighofer, Conor F. Hayes, Roberto Dailey, Risto Miikkulainen, Xin Qiu · 2026-05-29 04:00

Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies

arXiv:2605.30148v1 Announce Type: cross Abstract: Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only trainin…

arXiv cs.AI TIER_1 English(EN) · Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui, Longtao Huang, Hui Xue, Ningyu Zhang · 2026-05-29 04:00

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

arXiv:2605.30260v1 Announce Type: cross Abstract: Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rel…

arXiv cs.AI TIER_1 English(EN) · Zhongzhi Li, Xuansheng Wu, Yijiang Li, Lijie Hu, Ninghao Liu · 2026-05-29 04:00

Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders

arXiv:2602.10388v3 Announce Type: replace-cross Abstract: The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics …

arXiv cs.CL TIER_1 English(EN) · Vinay Samuel, Yapei Chang, Mohit Iyyer · 2026-05-29 04:00

Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs

arXiv:2605.30021v1 Announce Type: new Abstract: Many open-ended instructions have multiple valid answers that users can benefit from seeing, but post-training often narrows an LLM's output space toward a small set of canonical responses. We introduce REDIPO, an offline DPO data-c…

arXiv cs.CL TIER_1 English(EN) · Jiamin Chen, Yidi Wu, Qiexiang Wang, Qianben Chen, Yuchen Li, Yansen Zhang, Xiaokun Zhang, Wangchunshu Zhou, Chen Ma · 2026-05-29 04:00

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?

arXiv:2605.30104v1 Announce Type: new Abstract: Widely used language-model benchmarks are increasingly saturated, with frontier systems often receiving near-tied scores that standard metrics cannot resolve. Rather than constructing harder alternatives, we ask whether existing tas…

arXiv cs.CL TIER_1 English(EN) · Shaojie Wang, Liang Zhang · 2026-05-29 04:00

Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning

arXiv:2605.30245v1 Announce Type: new Abstract: Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examinati…

arXiv cs.CL TIER_1 English(EN) · Haoxiang Jiang, Zihan Dong, Tianci Liu, Wanying Wang, Ran Xu, Tony Yu, Linjun Zhang, Haoyu Wang · 2026-05-29 04:00

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

arXiv:2605.29156v1 Announce Type: cross Abstract: Pointwise reward modeling offers critical signals for LLM post-training, yet struggles with absolute scoring in subjective, non-verifiable settings. Rubric-based methods address this by decomposing evaluation into explicit criteri…

arXiv cs.CL TIER_1 English(EN) · Anany Kotawala · 2026-05-29 04:00

Resolution Diagnostics for Paired LLM Evaluation

arXiv:2605.30315v1 Announce Type: new Abstract: Across two public LLM leaderboards, many displayed pairwise rankings do not meet a conventional paired-test resolution target under the actual paired evaluation design: 11 of 40 Open LLM Leaderboard v1 pairwise comparisons and 4 of …

arXiv cs.AI TIER_1 English(EN) · Daniel Lee, Owen Queen, James Zou · 2026-05-29 04:00

ReasonOps: Operator Segmentation for LLM Reasoning Traces

arXiv:2605.29192v1 Announce Type: new Abstract: Chain-of-thought traces from large reasoning models can span tens of thousands of tokens, yet we lack a vocabulary for describing their internal structure. Previous methods developed to analyze chain-of-thought traces are either too…

arXiv cs.AI TIER_1 English(EN) · Zhihao Liu, Yifan Wu, Jian Lou, Di Wang, Yuxi Zhou, Yuke Hu · 2026-05-29 04:00

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

arXiv:2605.29396v1 Announce Type: new Abstract: Safety alignment for large language models (LLMs) aims to reduce harmful or unsafe behavior while preserving general utility. However, recent findings reveal that alignment effects can be fragile: lightweight post-alignment manipula…

arXiv cs.LG TIER_1 English(EN) · Alaa Khamis, Alaa Maalouf · 2026-05-29 04:00

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

arXiv:2605.30337v1 Announce Type: new Abstract: Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if i…

arXiv cs.LG TIER_1 English(EN) · Xiaowen Jiang, Andrei Semenov, Sebastian U. Stich · 2026-05-29 04:00

Enhancing LLM Training via Spectral Clipping

arXiv:2603.14315v2 Announce Type: replace Abstract: While spectral-based optimizers like Muon operate directly on the spectrum of updates, standard adaptive methods such as AdamW do not account for the spectral structure of weights and gradients, leaving them vulnerable to two em…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-29 00:00

dMoE: dLLMs with Learnable Block Experts

Diffusion large language models combined with mixture-of-experts architectures face a mismatch between block parallel decoding and token-level expert selection, which dMoE addresses by aggregating token-level distributions into block-level routing to reduce activated experts and …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-29 00:00

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

DOMINO enables domain-specific data synthesis through an inductive approach that learns domain representations from reference examples, improving code benchmark performance without requiring explicit domain descriptions.

arXiv cs.LG TIER_1 English(EN) · Alaa Maalouf · 2026-05-28 17:59

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen …

arXiv cs.AI TIER_1 English(EN) · Scarlett Li · 2026-05-28 17:58

Demystifying Data Organization for Enhanced LLM Training

Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced training remains an underexplored area, particu…

arXiv cs.CL TIER_1 English(EN) · Anany Kotawala · 2026-05-28 17:54

Resolution Diagnostics for Paired LLM Evaluation

Across two public LLM leaderboards, many displayed pairwise rankings do not meet a conventional paired-test resolution target under the actual paired evaluation design: 11 of 40 Open LLM Leaderboard v1 pairwise comparisons and 4 of 9 MMLU-Pro top-10 adjacent-rank pairs are unreso…

arXiv cs.AI TIER_1 English(EN) · Ningyu Zhang · 2026-05-28 17:22

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving t…

arXiv cs.CL TIER_1 English(EN) · Liang Zhang · 2026-05-28 17:11

Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning

Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examination reveals an inherent paradigm-level gap: both …

arXiv cs.LG TIER_1 English(EN) · Wei Zhang · 2026-05-28 16:50

MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference

Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token verification, incurring cost even when most steps a…

arXiv cs.AI TIER_1 English(EN) · Raiyan Abdul Baten · 2026-05-28 16:10

Anchorless Diversification for Parallel LLM Ideation

LLMs are increasingly used to generate candidate-idea pools for creative tasks where broad exploration is valuable. Parallel inference can be attractive in this setting when it broadens the pool while retaining quality and cost efficiency. We study inference-time controls for can…

arXiv cs.AI TIER_1 English(EN) · Xin Qiu · 2026-05-28 16:08

Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies

Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only training. However, recent work suggests that ES fine-tuni…

arXiv cs.CL TIER_1 English(EN) · Chen Ma · 2026-05-28 15:46

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?

Widely used language-model benchmarks are increasingly saturated, with frontier systems often receiving near-tied scores that standard metrics cannot resolve. Rather than constructing harder alternatives, we ask whether existing tasks can be made informative again through improve…

arXiv cs.AI TIER_1 English(EN) · Aren A. Babikian · 2026-05-28 15:05

Projectional Decoding: Towards Semantic-Aware LLM Generation

Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring the semantic validity of these artifacts remains a fundamental challenge. Existing constrained decoding techniques can enforce syntactic cor…

arXiv cs.AI TIER_1 English(EN) · Wenhai Wang · 2026-05-28 14:57

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Large Language Models have demonstrated remarkable progress in general-purpose capabilities and can achieve strong performance in specific domains through fine-tuning on domain-specific data. However, acquiring high-quality data for target domains remains a significant challenge.…

arXiv cs.CL TIER_1 English(EN) · Mohit Iyyer · 2026-05-28 14:42

Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs

Many open-ended instructions have multiple valid answers that users can benefit from seeing, but post-training often narrows an LLM's output space toward a small set of canonical responses. We introduce REDIPO, an offline DPO data-construction pipeline for recovering distinct val…

arXiv cs.AI TIER_1 English(EN) · Yuming (Rapheal), Huang, Yao Liu, Lei Wang, Junchen Wan · 2026-05-28 04:00

Let the Results Speak: A Replication-First Paradigm for LLM Behavioral Benchmarking

arXiv:2605.27914v1 Announce Type: cross Abstract: Subjective evaluation of LLM behavior -- empathy, restraint, calibrated emotional tone -- is hard. Human inter-rater agreement on such qualities saturates near rho ~ 0.45, and an LLM-as-judge proxy alone risks circularity: a judge…

arXiv cs.AI TIER_1 English(EN) · Zhenghan Song, Yunyi Li, Yulong Liu · 2026-05-28 04:00

Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking

arXiv:2605.27712v1 Announce Type: new Abstract: Long reasoning traces need reliability estimates before final answers are known. We study prefix-conditioned eventual-success estimation, $P(y=1 \mid o_{1:t})$, using prefix-safe observations. Sequential Bayesian Belief Tracking (SB…

arXiv cs.AI TIER_1 English(EN) · Hankyeol Kim, Pilsung Kang · 2026-05-28 04:00

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

arXiv:2605.27752v1 Announce Type: new Abstract: LLM confidence calibration is often evaluated by comparing two signals: token-probability scores and verbalized confidence. These signals are sometimes treated as direct readouts of model uncertainty, but their comparison depends on…

arXiv cs.AI TIER_1 English(EN) · Bowen Wei, Nan Wang, Yuqing Zhou, Jinhao Pan, Ziwei Zhu · 2026-05-28 04:00

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

arXiv:2605.28010v1 Announce Type: new Abstract: Self-evolving large language models (LLMs) learn by generating their own training tasks and solutions, reducing reliance on human-curated supervision. However, in many reasoning domains, the model must also validate generated tasks …

arXiv cs.AI TIER_1 English(EN) · Pu Li, Jiawen Qi, Qinyu Chen · 2026-05-28 04:00

When NPUs Are Not Always Faster: A Stage-Level Analysis of Mobile LLM Inference

arXiv:2605.27435v1 Announce Type: cross Abstract: Deploying large language models (LLMs) on mobile devices increasingly relies on heterogeneous execution, yet no prior study has systematically characterized NPU effectiveness at the operator and pipeline level. We present the firs…

arXiv cs.AI TIER_1 English(EN) · Yuchao Wu, Wenji Fang, Jing Wang, Wenkai Li, Ziyan Guo, Zhiyao Xie · 2026-05-28 04:00

AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

arXiv:2605.27472v1 Announce Type: cross Abstract: Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models…

arXiv cs.AI TIER_1 English(EN) · Zehao Liu, Yuanpu Cao, Jinghui Chen, Vasant G. Honavar · 2026-05-28 04:00

Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning

arXiv:2605.27765v1 Announce Type: cross Abstract: Self-Distillation Policy Optimization (SDPO) provides dense token-level credit assignment for reinforcement learning with large language models by leveraging the model's own feedback-conditioned predictions as a self-teacher. Unli…

arXiv cs.AI TIER_1 English(EN) · Hui Yang, Daiwei He, Kevin Jiang, Taejin Park, Kungang Li, Jiajun Luo, Yuying Chen, Xinyi Zhang, Sihan Wang, Haoyu He, Yu Liu, Lakshmi Manoharan, David Xue, Shubham Barhate, Runze Su, Duna Zhan, Ling Leng, Siping Ji, Jinfeng Zhuang, Alice Wu, Leo Lu, Han… · 2026-05-28 04:00

Fine-Tuned LLM as a Complementary Predictor Improving Ads System

arXiv:2605.27856v1 Announce Type: cross Abstract: Recommendation systems power engagement and monetization across feeds, ads, and short-video platforms, but translating the latest advances in Large Language Models into Recommendation Systems (RecSys) gains remains rare, particula…

arXiv cs.LG TIER_1 English(EN) · Zelin Li, Caiwen Ding · 2026-05-28 04:00

LLM Zeroth-Order Fine-Tuning is an Inference Workload

arXiv:2605.28760v1 Announce Type: new Abstract: Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training l…

arXiv cs.AI TIER_1 English(EN) · Kerui Peng, Feifei Li, Xingyu Fan, Wenhui Que · 2026-05-28 04:00

Semantic Flow Regularization: Teaching LLMs to Generate Diverse Yet Coherent Responses

arXiv:2605.27971v1 Announce Type: cross Abstract: When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a failure we term Cross-Style Collapse. We trace this collapse to the cross-entropy objectiv…

arXiv cs.AI TIER_1 English(EN) · Leonardo Matthew Yauw, Wei-Bin Kou, Yujiu Yang · 2026-05-28 04:00

Integrated and Cross-Architecture Interpretation of LLM Reasoning

arXiv:2605.28006v1 Announce Type: cross Abstract: Understanding how LLMs reason is hindered by a practical asymmetry: while their generated outputs are observable, the underlying reasoning patterns remain opaque. Relying on single probes, such as Mutual Information Peak (MIP) or …

arXiv cs.AI TIER_1 English(EN) · Jiazhen Huang, Xiao Chen, Xiao Luo, Yong Dai, Senkang Hu, Yuzhi Zhao · 2026-05-28 04:00

Skill-Conditioned Gated Self-Distillation for LLM Reasoning

arXiv:2605.28791v1 Announce Type: cross Abstract: On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as ref…

arXiv cs.AI TIER_1 English(EN) · Yutong Wang, Pengliang Ji, Chaoqun Yang, Kaixin Li, Ming Hu, Jiaoyang Li, Guillaume Sartoretti · 2026-05-28 04:00

MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation

arXiv:2502.12468v2 Announce Type: replace-cross Abstract: The LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as programming. Inspired by recent advances in reasoning models and shifts in scaling…

arXiv cs.AI TIER_1 English(EN) · Pengkai Wang, Pengwei Liu, Qi Zuo, Zhijie Sang, Congkai Xie, Hongxia Yang · 2026-05-28 04:00

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

arXiv:2510.15859v4 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has driven recent breakthroughs in large language models (LLMs), especially for tasks where rewards can be computed automatically, such as code generation. However, it is less effective in open-…

arXiv cs.CL TIER_1 English(EN) · Jiayong Wan, Jiawei Chen, Zhaoxia Yin, Liu Shuyuan, Hang Su · 2026-05-28 04:00

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

arXiv:2605.27375v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly acting as autonomous agents, but their continuous interaction with the environment can lead to in-context reward hacking (ICRH), a phenomenon where LLMs iteratively optimize their behavi…

arXiv cs.CL TIER_1 English(EN) · Pitipat Kongsomjit, Suryansh Goyal, Jacob Whitehill · 2026-05-28 04:00

Learning to Translate from Soft to Hard LLM Prompts

arXiv:2605.27642v1 Announce Type: new Abstract: Soft prompt tuning is a parameter-efficient method for adapting LLMs to specific tasks, but suffers from a lack of interpretability. Building on recent work on interpreting soft prompts (Ramati et al., 2024), we explore how training…

arXiv cs.CL TIER_1 English(EN) · Haihui Pan, Junwei Bao, Hongfei Jiang, Yang Song · 2026-05-28 04:00

FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning

arXiv:2605.28389v1 Announce Type: new Abstract: While large language models have made significant progress in mathematical reasoning, they remain unreliable at judging the correctness of their own solutions. Existing approaches that equip models with self-verification typically t…

arXiv cs.LG TIER_1 English(EN) · Binh-Nguyen Nguyen, Khang Tran, NhatHai Phan, Issa Khalil · 2026-05-28 04:00

Gradient Transformer: Learning to Generate Updates for LLMs

arXiv:2605.27591v1 Announce Type: new Abstract: Many organizations lack computational resources to fine-tune large language models (LLMs) on private (unshareable) data for better utility, while fine-tuning tiny language models (TinyLMs) alone performs poorly. To address this bott…

arXiv cs.NE (Neural & Evolutionary) TIER_1 English(EN) · Jianguo Zhang · 2026-05-28 03:22

EvoGM: Learning to Merge LLMs via Evolutionary Generative Optimization

Evolutionary model merging provides a powerful framework for the automated, training-free composition of LLMs through parameter-space search. However, existing methods predominantly rely on stochastic, hand-crafted operators that overlook the underlying performance landscape of t…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Research investigates the quantitative limits of parametric memory in large language models using LoRA as a probe, establishing a power law relationship and developing a threshold-guided optimization method for improved memory performance.

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Dingyan Shang · 2026-05-27 17:57

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation. We study PPO-trained agents using MiniGrid as core evaluation and MuJoCo as boundary stress test. Our…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 17:49

Skill-Conditioned Gated Self-Distillation for LLM Reasoning

On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as reference answers or successful traces. We ask whethe…

arXiv cs.AI TIER_1 English(EN) · Yuzhi Zhao · 2026-05-27 17:49

Skill-Conditioned Gated Self-Distillation for LLM Reasoning

On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as reference answers or successful traces. We ask whethe…

arXiv cs.LG TIER_1 English(EN) · Caiwen Ding · 2026-05-27 17:19

LLM Zeroth-Order Fine-Tuning is an Inference Workload

Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeate…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 17:19

LLM Zeroth-Order Fine-Tuning is an Inference Workload

Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeate…

arXiv cs.CL TIER_1 English(EN) · Yang Song · 2026-05-27 12:26

FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning

While large language models have made significant progress in mathematical reasoning, they remain unreliable at judging the correctness of their own solutions. Existing approaches that equip models with self-verification typically treat solution generation and verification as two…

arXiv cs.AI TIER_1 English(EN) · Paul Sigloch, Christoph Benzm\"uller · 2026-05-27 04:00

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

arXiv:2605.26942v1 Announce Type: new Abstract: LLMs deployed in high-stakes domains face fundamental reliability challenges: hallucinations, inconsistencies, and privacy vulnerabilities introduce unacceptable risks where errors carry legal, financial, or safety consequences. Thi…

arXiv cs.AI TIER_1 English(EN) · Allen Nie, Xavier Daull, Zhiyi Kuang, Abhinav Akkiraju, Anish Chaudhuri, Max Piasevoli, Ryan Rong, YuCheng Yuan, Prerit Choudhary, Shannon Xiao, Rasool Fakoor, Adith Swaminathan, Ching-An Cheng · 2026-05-27 04:00

Understanding the Challenges in Iterative Generative Optimization with LLMs

arXiv:2603.23994v2 Announce Type: replace-cross Abstract: Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in…

arXiv cs.AI TIER_1 English(EN) · Mind Lab, :, Song Cao, Vic Cao, Andrew Chen, Kaijie Chen, Cleon Cheng, Steven Chiang, Kaixuan Fan, Hera Feng, Huan Feng, Arthur Fu, Jun Gao, Hongquan Gu, Aaron Guan, Nolan Ho, Mutian Hong, Hailee Hou, Peixuan Hua, Charles Huang, Miles Jiang, Nora Jiang,… · 2026-05-27 04:00

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

arXiv:2605.13779v2 Announce Type: replace-cross Abstract: We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of exp…

arXiv cs.CL TIER_1 English(EN) · Faeze Ghorbanpour, Alexander Fraser · 2026-05-27 04:00

On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs

arXiv:2510.05864v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly operate on long inputs, yet their behavior when harmful sentences are sparsely embedded within such inputs remains poorly understood. We present a sensitivity analysis that probes how LL…

arXiv cs.AI TIER_1 English(EN) · Corentin Kervadec, Iuliia Lysova, Iuri Macocco, Marco Baroni, Gemma Boleda · 2026-05-27 04:00

Tracing Computation Density in LLMs

arXiv:2605.27033v1 Announce Type: cross Abstract: Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but it is not clear that they exploit their full capacity for all inputs. We introduce the s-Tr…

arXiv cs.AI TIER_1 English(EN) · Xin Huang, Antoni B. Chan · 2026-05-27 04:00

Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information

arXiv:2601.03089v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging. Existing soft-perturbation faithfulness metrics, such as Soft-NC and Soft-NS, can…

arXiv cs.AI TIER_1 English(EN) · Han Jiang, Dongyao Zhu, Zhihua Wei, Xiaoyuan Yi, Ziang Xiao, Xing Xie · 2026-05-27 04:00

PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization

arXiv:2507.16679v3 Announce Type: replace-cross Abstract: In-Context Learning has shown great potential for aligning Large Language Models (LLMs) with human values, helping reduce harmful outputs and accommodate diverse preferences without costly post-training, known as In-Contex…

arXiv cs.AI TIER_1 English(EN) · Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, Lei Hou, Juanzi Li, Xiaozhi Wang · 2026-05-27 04:00

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

arXiv:2605.27354v1 Announce Type: cross Abstract: Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in mod…

arXiv cs.AI TIER_1 English(EN) · Wenhui Tan, Minghao Li, Xiaoqian Ma, Siqi Fan, Xiusheng Huang, Liujie Zhang, Ruihua Song, Weihang Chen · 2026-05-27 04:00

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

arXiv:2605.27255v1 Announce Type: cross Abstract: Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decod…

arXiv cs.AI TIER_1 English(EN) · Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang, Yusen Zhang, Liang Wang, Limin Xiao · 2026-05-27 04:00

ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

arXiv:2605.27081v1 Announce Type: cross Abstract: Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a sm…

arXiv cs.CL TIER_1 English(EN) · Ishir Garg, Neel Kolhe, Xuandong Zhao, Dawn Song · 2026-05-27 04:00

InfoSynth: Information-Guided Benchmark Synthesis for LLMs

arXiv:2601.00575v2 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated significant advancements in reasoning and code generation, but efficiently creating new benchmarks to evaluate these capabilities remains a challenge. Traditional benchmark creation…

arXiv cs.AI TIER_1 English(EN) · Adnan Rashid · 2026-05-27 04:00

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

arXiv:2605.27014v1 Announce Type: cross Abstract: Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents. Recent advances in theorem proving, autoformalization, symbolic reasoning, and too…

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Zhifang Liu · 2026-05-27 02:19

Fine-Tuned LLM as a Complementary Predictor Improving Ads System

Recommendation systems power engagement and monetization across feeds, ads, and short-video platforms, but translating the latest advances in Large Language Models into Recommendation Systems (RecSys) gains remains rare, particularly in advertising and production-scale real-world…

arXiv cs.IR (Information Retrieval) TIER_1 Dansk(DA) · Jiaxuan You · 2026-05-27 01:04

LRanker: LLM Ranker for Massive Candidates

Large language models (LLMs) have recently shown strong potential for ranking by capturing semantic relevance and adapting across diverse domains, yet existing methods remain constrained by limited context length and high computational costs, restricting their applicability to re…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 00:00

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

RUBRIC-ARROW presents an alternating framework for reward modeling that improves upon rubric-based methods by reducing ties and leveraging pairwise preference data for training.

arXiv cs.AI TIER_1 English(EN) · Xiaozhi Wang · 2026-05-26 17:55

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering…

arXiv cs.AI TIER_1 English(EN) · Weihang Chen · 2026-05-26 16:31

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decoding and multi-token prediction, MTP), but the two …

arXiv cs.AI TIER_1 English(EN) · Limin Xiao · 2026-05-26 14:32

ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in t…

arXiv cs.AI TIER_1 English(EN) · Gemma Boleda · 2026-05-26 13:55

Tracing Computation Density in LLMs

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but it is not clear that they exploit their full capacity for all inputs. We introduce the s-Trace method to efficiently estimate the subgraph of…

arXiv cs.AI TIER_1 English(EN) · Adnan Rashid · 2026-05-26 13:32

ReasonOps: A Unified Operational Paradigm for Trustworthy Verified LLM Reasoning

Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents. Recent advances in theorem proving, autoformalization, symbolic reasoning, and tool-augmented language models demonstrate substantia…

arXiv cs.AI TIER_1 English(EN) · Christoph Benzmüller · 2026-05-26 12:32

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

LLMs deployed in high-stakes domains face fundamental reliability challenges: hallucinations, inconsistencies, and privacy vulnerabilities introduce unacceptable risks where errors carry legal, financial, or safety consequences. This paper presents a hybrid verification architect…

arXiv cs.AI TIER_1 English(EN) · Minwei Kong, Chonghe Jiang, Ao Qu, Wenbin Ouyang, Zhaoming Zeng, Xiaotong Guo, Zhekai Li, Junyi Li, Yi Fan, Xinshou Zheng, Xi Jing, Yikai Zhang, Zhiwei Liang, Seonghoo Kim, Runqing Yang, Zijian Zhou, Sirui Li, Han Zheng, Wangyang Ying, Ou Zheng, Chonghua… · 2026-05-26 04:00

FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

arXiv:2605.25246v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research and optimization problems often require a harder capability: designing scalable algorithms th…

arXiv cs.LG TIER_1 English(EN) · Haoyu Zheng, Yongqiang Zhang, Fangcheng Fu, Xiaokai Zhou, Hao Luo, Hongchao Zhu, Yuanyuan Zhu, Hao Wang, Xiao Yan, Jiawei Jiang · 2026-05-26 04:00

Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions

arXiv:2604.00499v2 Announce Type: replace Abstract: To schedule LLM inference, the \textit{shortest job first} (SJF) principle is favorable by prioritizing requests with short output lengths to avoid head-of-line (HOL) blocking. Existing methods usually predict a single output le…

arXiv cs.LG TIER_1 English(EN) · Daniel Barley, Jonathan Leis, Benjamin Klenk, Holger Fr\"oning · 2026-05-26 04:00

A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training

arXiv:2605.24006v1 Announce Type: cross Abstract: Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose …

arXiv cs.LG TIER_1 English(EN) · Zili Zhang, Chengxu Yang, Shenglong Zhang, Chenyu Wang, Yufan Zhang, Tuo Dai, Zhouyang Li, Yuhong Ge, Chao Jin, Xin Jin, Yuliang Liu · 2026-05-26 04:00

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

arXiv:2605.25451v1 Announce Type: new Abstract: Training multimodal large language models (MLLMs) is challenged by both model and data heterogeneity. Existing systems redesign the training pipeline to address these challenges, but remain bound by a Pareto frontier between compute…

arXiv cs.LG TIER_1 English(EN) · Enayat Ullah, Sai Aparna Aketi, Devansh Gupta, Huanyu Zhang, Meisam Razaviyayn · 2026-05-26 04:00

Efficient DP-SGD for LLMs with Randomized Clipping

arXiv:2605.24879v1 Announce Type: new Abstract: Large language models (LLMs) are trained on vast datasets that may contain sensitive information. Differential privacy (DP), the de facto standard for formal privacy guarantees, provides a principled framework for training LLMs with…

arXiv cs.LG TIER_1 English(EN) · Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su · 2026-05-26 04:00

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

arXiv:2605.24331v1 Announce Type: new Abstract: Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining wha…

arXiv cs.CL TIER_1 English(EN) · Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou · 2026-05-26 04:00

PowLU: An Activation Function for Stable Pre-Training of LLMs

arXiv:2605.25704v1 Announce Type: new Abstract: In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates th…

arXiv cs.CL TIER_1 English(EN) · Xiangdong Zhang, Debing Zhang, Shaofeng Zhang, Xiaohan Qin, Yu Cheng, Junchi Yan · 2026-05-26 04:00

NITP: Next Implicit Token Prediction for LLM Pre-training

arXiv:2605.24956v1 Announce Type: new Abstract: Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained, allowi…

arXiv cs.AI TIER_1 English(EN) · Siyuan Liu, Tinghong Chen, Xinghan Li, Yifei Wang, Jingzhao Zhang · 2026-05-26 04:00

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

arXiv:2605.12906v2 Announce Type: replace-cross Abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity,…

arXiv cs.AI TIER_1 English(EN) · Ruishuo Chen, Yu Chen, Zhuoran Li, Longbo Huang · 2026-05-26 04:00

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

arXiv:2603.18363v2 Announce Type: replace-cross Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current met…

arXiv cs.AI TIER_1 English(EN) · Haojie Ouyang, Jianwei Lv, Lei Ren, Chen Wei, Xiaojie Wang, Fangxiang Feng · 2026-05-26 04:00

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

arXiv:2510.02361v2 Announce Type: replace-cross Abstract: Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention's quadratic complexity with input tokens. Recently, researcher…

arXiv cs.AI TIER_1 English(EN) · Zhuchen Cao, Sven Apel, Adish Singla, Vera Demberg · 2026-05-26 04:00

Pragmatic Reasoning improves LLM Code Generation

arXiv:2502.15835v5 Announce Type: replace-cross Abstract: Pragmatic reasoning helps interlocutors infer intended meaning from ambiguous or underspecified messages by considering shared context and counterfactual alternatives. Similar challenges arise in natural language-to-code g…

arXiv cs.AI TIER_1 English(EN) · Akira Okutomi · 2026-05-26 04:00

False Fixed Points: Kantian Feedback, Stable Miscalibration, and Representational Compression in LLMs

arXiv:2510.14925v4 Announce Type: replace Abstract: High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fixed points, locally stable, internally coherent, and confidently wrong. This separates ro…

arXiv cs.AI TIER_1 English(EN) · Parth Darshan, Abhishek Divekar · 2026-05-26 04:00

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

arXiv:2605.26046v1 Announce Type: cross Abstract: Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produ…

arXiv cs.AI TIER_1 English(EN) · Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin, Varun Parekh, Vijaykrishnan Narayanan, Rui Zhang · 2026-05-26 04:00

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

arXiv:2605.25850v1 Announce Type: cross Abstract: This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a …

arXiv cs.AI TIER_1 English(EN) · Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang · 2026-05-26 04:00

AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

arXiv:2605.25658v1 Announce Type: cross Abstract: Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling e…

arXiv cs.AI TIER_1 English(EN) · Xiangtian Ji, Yuxin Chen, Zhengzhou Cai, Xiang Wang, An Zhang, Tat-Seng Chua · 2026-05-26 04:00

Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts

arXiv:2605.24846v1 Announce Type: cross Abstract: Large language models (LLMs) display strong comprehensive abilities, yet the internal mechanisms that support these behaviors remain insufficiently understood. In this work, we show that across a wide range of open-weight Transfor…

arXiv cs.AI TIER_1 English(EN) · Jaeung Lee, Dohyun Kim, Jaemin Jo · 2026-05-26 04:00

Measuring the Depth of LLM Unlearning via Activation Patching

arXiv:2605.24614v1 Announce Type: cross Abstract: Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail …

arXiv cs.AI TIER_1 English(EN) · Haizhou Xia · 2026-05-26 04:00

Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning

arXiv:2605.24613v1 Announce Type: cross Abstract: Post-hoc repair of LLM mathematical reasoning introduces an asymmetric risk: fixing an incorrect reasoning trace is useful, but replacing a trace that was already correct can be harmful. We study this problem under a selective rep…

arXiv cs.AI TIER_1 English(EN) · Jo\~ao Sedoc, Baotong Zhang, Dean Foster · 2026-05-26 04:00

Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction

arXiv:2605.25133v1 Announce Type: new Abstract: Reliably knowing when a language model is correct is almost as important as being correct. We introduce prover-verifier deliberation (PVD), an inference-time protocol grounded in interactive proof theory, as a mechanism for selectiv…

arXiv cs.AI TIER_1 English(EN) · Jingchu Gai, Guanning Zeng, Christina Baek, Chen Wu, J. Zico Kolter, Andrej Risteski, Aditi Raghunathan · 2026-05-26 04:00

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

arXiv:2605.24396v1 Announce Type: new Abstract: Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving reasoning quality directly would require process reward…

arXiv cs.AI TIER_1 English(EN) · Ashok Chandrasekar, Jason Kramberger · 2026-05-26 04:00

Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks

arXiv:2605.24217v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from research environments to production deployments, evaluating their performance against strict Service Level Objectives (SLOs) has become critical. However, current evaluation methodolog…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-26 00:00

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

SAERL uses Sparse Autoencoder-derived signals from model internals to enhance LLM reinforcement learning through diversity control, difficulty-aware curriculum learning, and quality-based data filtering.

arXiv cs.AI TIER_1 English(EN) · Abhishek Divekar · 2026-05-25 17:08

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vecto…

arXiv cs.AI TIER_1 English(EN) · Rui Zhang · 2026-05-25 13:42

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamic…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 13:42

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamic…

arXiv cs.LG TIER_1 English(EN) · Jun Zhou · 2026-05-25 11:02

PowLU: An Activation Function for Stable Pre-Training of LLMs

In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong non…

arXiv cs.AI TIER_1 English(EN) · Mengjie Zhang · 2026-05-25 10:04

AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 10:04

AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 08:26

AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

Post-training via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is crucial for enhancing reasoning in Multimodal Large Language Models (MLLMs), yet existing paradigms often reach a performance bottleneck due to the limitations of static data. While current methods …

arXiv cs.AI TIER_1 English(EN) · Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Yi Li, Yan Sun, Boyu Wang, Pingzhao Hu · 2026-05-25 04:00

Scaling-Aware Adapter for Structure-Grounded LLM Reasoning

arXiv:2602.02780v3 Announce Type: replace Abstract: Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query conn…

arXiv cs.AI TIER_1 English(EN) · Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao · 2026-05-25 04:00

BarrierSteer: LLM Safety via Learning Barrier Steering

arXiv:2602.20102v2 Announce Type: replace-cross Abstract: Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in h…

arXiv cs.AI TIER_1 English(EN) · Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea · 2026-05-25 04:00

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

arXiv:2605.23168v1 Announce Type: cross Abstract: When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed atta…

arXiv cs.AI TIER_1 English(EN) · Chuyifei Zhang, Hongyu Cui, Xiaowen Huang, Jitao Sang · 2026-05-25 04:00

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

arXiv:2605.23170v1 Announce Type: cross Abstract: Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-cont…

arXiv cs.AI TIER_1 English(EN) · Ziyue Liu, Zhengyang Wang, Ruijie Zhang, Avinash Maurya, Hui Zhou, Paul Hovland, Sheng Di, Franck Cappello, Bogdan Nicolae, Zheng Zhang · 2026-05-25 04:00

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

arXiv:2605.11215v2 Announce Type: replace-cross Abstract: Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism sc…

arXiv cs.LG TIER_1 English(EN) · Wei Lin, Yining Jiang, Qingyu Song, Qiao Xiang, Hong Xu · 2026-05-25 04:00

AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning

arXiv:2601.17261v4 Announce Type: replace Abstract: Zeroth-Order (ZO) optimization has emerged as a promising solution for fine-tuning LLMs under strict memory constraints, as it avoids the prohibitive memory cost of storing activations for backpropagation. However, existing ZO m…

arXiv cs.LG TIER_1 English(EN) · Mohammad R. Rezaei, Rahul G. Krishnan · 2026-05-25 04:00

From Residuals to Reasons: LLM-Guided Mechanism Inference from Tabular Data

arXiv:2605.22897v1 Announce Type: new Abstract: A persistent challenge in machine learning for scientific applications is jointly achieving prediction and understanding. Statistical models excel on structured data but operate as black boxes, while existing interpretability method…

arXiv cs.AI TIER_1 English(EN) · Sixing Chen, Ji-An Li, Saner Cakir, Sinan Akcali, Kayla Lee, Marcelo G. Mattar · 2026-05-25 04:00

Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

arXiv:2605.06840v5 Announce Type: replace Abstract: Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine plan…

arXiv cs.AI TIER_1 English(EN) · Yiwen Duan, Jing Ye, Xinpei Zhao · 2026-05-25 04:00

ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation

arXiv:2602.05472v2 Announce Type: replace Abstract: The quest for expert-level reasoning in Large Language Models (LLMs) has been hampered by a persistent \textit{reward bottleneck}: traditional reinforcement learning (RL) relies on scalar rewards that are \textbf{costly} to scal…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 00:00

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Multi-objective LLM judge customization using textual gradients faces challenges from gradient dilution and instruction interference that limit optimization effectiveness.

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-24 00:00

NITP: Next Implicit Token Prediction for LLM Pre-training

Next Implicit Token Prediction enhances language model training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead.

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-23 00:00

Measuring the Depth of LLM Unlearning via Activation Patching

A new metric called Unlearning Depth Score (UDS) is introduced to evaluate how thoroughly knowledge has been removed from large language models, addressing limitations of previous methods that could not detect hidden knowledge in internal representations.

arXiv cs.LG TIER_1 English(EN) · Jialin Chen, Aosong Feng, Harshit Verma, Siyi Gu, Haiwen Wang, Ali Maatouk, Yixuan He, Yifeng Gao, Leandros Tassiulas, Rex Ying · 2026-05-22 04:00

Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

arXiv:2605.21975v1 Announce Type: new Abstract: Financial markets are characterized by extreme non-stationarity, low signal-to-noise ratios, and strong dependence on external information such as news, company fundamentals, and macroeconomic signals. Yet, existing approaches eithe…

arXiv cs.LG TIER_1 English(EN) · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan · 2026-05-22 04:00

One-Way Policy Optimization for Self-Evolving LLMs

arXiv:2605.22156v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency…

arXiv cs.LG TIER_1 English(EN) · Manuel Noah Riesen, Peter Alfred von Niederh\"ausern · 2026-05-22 04:00

Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs

arXiv:2605.22195v1 Announce Type: new Abstract: Graph of Thoughts (GoT), a generalized form of recent prompting paradigms for large language models (LLMs), has been shown to be useful for elaborate problem solving. By executing a graph of operations, thoughts of the LLM are struc…

arXiv cs.LG TIER_1 English(EN) · Di He, Songjun Tu, Keyu Wang, Lu Yin, Shiwei Liu · 2026-05-22 04:00

One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

arXiv:2605.22297v1 Announce Type: new Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting …

arXiv cs.LG TIER_1 English(EN) · Athanasios Glentis, Jiaxiang Li, Andi Han, Mingyi Hong · 2026-05-22 04:00

Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

arXiv:2506.16659v3 Announce Type: replace Abstract: Training large language models (LLMs) relies on adaptive optimizers such as Adam, which introduce extra operations and require significantly more memory to maintain first- and second-order moments than SGD. While recent works su…

arXiv cs.LG TIER_1 English(EN) · Tom Segal, Asaf Shabtai, Yuval Elovici · 2026-05-22 04:00

Provably Protecting Fine-Tuned LLMs from Training Data Extraction while Preserving Utility

arXiv:2602.00688v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) on sensitive datasets raises privacy concerns, as training data extraction (TDE) attacks can expose highly confidential information. Existing defenses against such attacks either lack for…

arXiv cs.LG TIER_1 English(EN) · Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal · 2026-05-22 04:00

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

arXiv:2602.12506v3 Announce Type: replace Abstract: Reinforcement learning (RL) finetuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision-language models (VLMs). While RL-tuned VLMs improve on…

arXiv cs.LG TIER_1 English(EN) · Huilin Zhou, Jian Zhao, Yilu Zhong, Zhen Liang, Xiuyuan Chen, Yuchen Yuan, Tianle Zhang, Chi Zhang, Lan Zhang, Xuelong Li · 2026-05-22 04:00

Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization

arXiv:2605.10067v3 Announce Type: replace Abstract: Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing approaches often rely on static heuristics or stochastic search, rendering them …

arXiv cs.LG TIER_1 English(EN) · Hongbin Zhang, Chaozheng Wang, Kehai Chen, Youcheng Pan, Yang Xiang, Jinpeng Wang, Min Zhang · 2026-05-22 04:00

Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

arXiv:2605.22263v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token…

arXiv cs.AI TIER_1 English(EN) · Akshay Manglik (Emily), Apaar Shanker (Emily), Kaustubh Deshpande (Emily), Jason Qin (Emily), Yash Maurya (Emily), Veronica Chatrath (Emily), Vijay S. Kalmath (Emily), Levi Lentz (Emily), Yuan (Emily), Xue · 2026-05-22 04:00

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

arXiv:2605.21347v2 Announce Type: new Abstract: Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does…

arXiv cs.AI TIER_1 English(EN) · Can Hankendi, Rana Shahout, Minlan Yu, Ayse K. Coskun · 2026-05-22 04:00

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

arXiv:2605.21427v1 Announce Type: new Abstract: Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and…

arXiv cs.AI TIER_1 English(EN) · Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye · 2026-05-22 04:00

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

arXiv:2605.20194v1 Announce Type: cross Abstract: Large language models (LLMs) have been increasingly used to analyze text. However, they are often plagued with contextual reasoning limitations when analyzing long documents. When long documents are processed sequentially, early o…

arXiv cs.AI TIER_1 English(EN) · Reese Levine, Rithik Sharma, Nikhil Jain, Abhijit Ramesh, Zheyuan Chen, Neha Abbas, James Contini, Tyler Sorensen · 2026-05-22 04:00

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

arXiv:2605.20706v1 Announce Type: cross Abstract: Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To re…

arXiv cs.AI TIER_1 English(EN) · Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu · 2026-05-22 04:00

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

arXiv:2605.21312v1 Announce Type: cross Abstract: Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simu…

arXiv cs.AI TIER_1 English(EN) · Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye · 2026-05-22 04:00

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

arXiv:2505.19075v3 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficien…

arXiv cs.AI TIER_1 English(EN) · Qizheng Li, Yifei Zhang, Xiao Yang, Xu Yang, Zhuo Wang, Weiqing Liu, Jiang Bian · 2026-05-22 04:00

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

arXiv:2603.01712v2 Announce Type: replace Abstract: Fine-tuning large language models for vertical domains remains labor-intensive, requiring practitioners to curate data, configure training, and iteratively diagnose model behavior. Despite growing interest in autonomous machine …

arXiv cs.AI TIER_1 English(EN) · Xian Wu, Kaijie Zhu, Ying Zhang, Lun Wang, Wenbo Guo · 2026-05-22 04:00

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

arXiv:2602.07832v2 Announce Type: replace-cross Abstract: Process rewards have been widely used in deep reinforcement learning to improve training efficiency, reduce variance, and prevent reward hacking. In LLM reasoning, existing works also explore various solutions for learning…

arXiv cs.AI TIER_1 English(EN) · Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang · 2026-05-22 04:00

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

arXiv:2605.17164v2 Announce Type: replace-cross Abstract: Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate an…

arXiv cs.AI TIER_1 English(EN) · Zikai Alex Wen · 2026-05-22 04:00

Toward User Comprehension Supports for LLM Agent Skill Specifications

arXiv:2605.19362v2 Announce Type: replace-cross Abstract: Users often interpret and select agent skills through their SKILL markdown specifications. To protect users, existing audits mainly focus on malicious or unsafe skills. We study the complementary question of whether specif…

arXiv cs.CL TIER_1 English(EN) · Zhenwei Tang, Zhaoyan Liu, Rasa Hosseinzadeh, Tongzi Wu, Keyvan Golestan, Jesse C. Cresswell · 2026-05-22 04:00

RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator

arXiv:2605.21748v1 Announce Type: new Abstract: As interactive LLM-based applications are created and refined, model developers need to evaluate the quality of generated text along many possible axes. For simpler systems, human evaluation may be practical, but in complicated syst…

arXiv cs.CL TIER_1 English(EN) · Xiaoyuan Li, Yubo Ma, Chengpeng Li, Fengbin Zhu, Yiyao Yu, Keqin Bao, Wenjie Wang, Fuli Feng, Dayiheng Liu · 2026-05-22 04:00

Unified Data Selection for LLM Reasoning

arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably d…

arXiv cs.CL TIER_1 English(EN) · Arip Asadulaev, Daniil Ognev, Karim Salta, Martin Takac · 2026-05-22 04:00

Value-Gradient Hypothesis of RL for LLMs

arXiv:2605.21654v1 Announce Type: cross Abstract: Reinforcement learning substantially improves pretrained language models, but it remains understudied why critic-free methods such as PPO and GRPO work as well as they do, and when they should provide the largest gains. We develop…

arXiv cs.CL TIER_1 English(EN) · Xing Zhang, Yanwei Cui, Guanghui Wang, Ziyuan Li, Wei Qiu, Bing Zhu, Peiyang He · 2026-05-22 04:00

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

arXiv:2605.22148v1 Announce Type: cross Abstract: Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while h…

arXiv cs.CL TIER_1 English(EN) · Fengfei Yu, Ruijia Niu, Dongxia Wu, Yian Ma, Rose Yu · 2026-05-22 04:00

Calibrating LLMs with Semantic-level Reward

arXiv:2605.15588v2 Announce Type: replace Abstract: As large language models (LLMs) are deployed in consequential settings such as medical question answering and legal reasoning, the ability to estimate when their outputs are likely to be correct is essential for safe and reliabl…

arXiv cs.CL TIER_1 English(EN) · Alexandre Cristov\~ao Maiorano · 2026-05-22 04:00

LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications

arXiv:2603.27355v2 Announce Type: replace-cross Abstract: We present a readiness harness for LLM and RAG applications that turns evaluation into a deployment decision workflow. The system combines automated benchmarks, OpenTelemetry observability, and CI quality gates under a min…

arXiv cs.LG TIER_1 English(EN) · Andy Han, Kristina Fujimoto, Avidan Shah, Kiet Nguyen, Kai Xu, Chen Yueh-Han, Ilia Sucholutsky, Rico Angell · 2026-05-22 04:00

On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation

arXiv:2605.21834v1 Announce Type: new Abstract: Aligned models can misbehave in several ways: they are often sycophantic, fall victim to jailbreaks, or fail to include appropriate safety warnings. Consistency training is a promising new alignment paradigm to mitigate such failure…

arXiv cs.LG TIER_1 English(EN) · Yu Li, Rui Miao, Tian Lan, Zhengling Qi · 2026-05-22 04:00

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

arXiv:2605.21851v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal re…

arXiv cs.LG TIER_1 English(EN) · Yifan Lan, Yuanpu Cao, Hanyu Wang, Lu Lin, Jinghui Chen · 2026-05-22 04:00

The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

arXiv:2605.21856v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive reasoning abilities across a wide range of tasks, but data contamination undermines the objective evaluation of these capabilities. This problem is further exacerbated by mal…

arXiv cs.CL TIER_1 English(EN) · Jitao Sang · 2026-05-22 02:42

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-context benchmarks and find none jointly controls task…

arXiv cs.CL TIER_1 English(EN) · Dayiheng Liu · 2026-05-21 12:21

Unified Data Selection for LLM Reasoning

Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably distinguish high- from low-quality reasoning samp…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 08:20

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while human-curated ones deliver $+16.2$pp: the bottlenec…

arXiv cs.CL TIER_1 English(EN) · Peiyang He · 2026-05-21 08:20

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while human-curated ones deliver $+16.2$pp: the bottlenec…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 00:00

The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

A black-box detection method called Zero-CoT Probe is introduced to identify data contamination in large language models by truncating reasoning processes and comparing performance on original and perturbed datasets.

arXiv cs.CL TIER_1 English(EN) · Yu Meng · 2026-05-20 17:53

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight t…

arXiv cs.AI TIER_1 English(EN) · Ayse K. Coskun · 2026-05-20 17:19

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 17:19

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a …

arXiv cs.AI TIER_1 English(EN) · Xue · 2026-05-20 16:13

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individua…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 16:13

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individua…

arXiv cs.AI TIER_1 English(EN) · Hong Xu · 2026-05-20 15:40

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing de…

arXiv cs.AI TIER_1 English(EN) · Tyler Sorensen · 2026-05-20 05:05

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the W…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 00:00

RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator

A benchmark generator called RankJudge evaluates large language model judges on multi-turn conversations by creating flawed conversation pairs and using statistical models for ranking and difficulty assessment.

arXiv cs.CL TIER_1 English(EN) · Yuzhang Shang · 2026-05-19 17:59

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Diffusion Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive (AR) models, offering better hardware utilization and bidirectional context through parallel block-level decoding. However, as dLLMs continue to scale up with mixture-of-experts (M…

arXiv cs.CL TIER_1 English(EN) · Yinghuan Shi · 2026-05-19 13:44

Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking too…

arXiv cs.AI TIER_1 English(EN) · Egor Shvetsov · 2026-05-19 12:48

Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are incre…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 12:48

Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are incre…

arXiv cs.CL TIER_1 English(EN) · Xuanjing Huang · 2026-05-19 09:40

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require conclusions to follow strictly from stated premises. Many existing logical-reasoning benchmarks are generated by templating natural-language items from s…

arXiv cs.CL TIER_1 English(EN) · Jieping Ye · 2026-05-19 06:42

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Large language models (LLMs) have achieved remarkable success in complex reasoning tasks via long chain-of-thought (CoT), yet their immense computational overhead hinders real-world deployment. LLM reasoning distillation addresses this by transferring reasoning capabilities from …

arXiv cs.CL TIER_1 English(EN) · Jitao Sang · 2026-05-19 04:41

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance …

arXiv cs.CL TIER_1 English(EN) · Hua Wei · 2026-05-19 00:57

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing me…

arXiv cs.AI TIER_1 English(EN) · Pascal Van Hentenryck · 2026-05-18 17:28

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. However, real-world environments are dynamic, with evolving business rules, previously overlooked constraints, and unforeseen perturbations. In…

arXiv cs.AI TIER_1 English(EN) · Shaowu Pan · 2026-05-18 16:34

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific pr…

arXiv cs.CL TIER_1 English(EN) · Song Guo · 2026-05-18 08:54

KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

Supporting long-context LLMs is challenging due to the substantial memory demands of the key-value (KV) cache. Existing offloading systems store the full cache in host memory and selectively fetch critical entries during decoding, but this strategy quickly hits a ceiling: sparsit…

arXiv cs.CL TIER_1 English(EN) · Maosong Sun · 2026-05-18 07:33

AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-performance computing. To fully exploit hardware potential, developers often resort to explicit vectorization using intrinsics, as compiler-based auto-vectorization frequently yields…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · James Evans · 2026-05-16 23:29

Multi-LLM Systems Exhibit Robust Semantic Collapse

Whether machines can originate novel content has been debated for nearly two centuries, from Lovelace's assertion that no engine can "originate anything" to Turing's question of whether a machine can amplify ideas brought in from outside. Multi-large language model (LLM) systems,…

arXiv cs.LG TIER_1 English(EN) · Wes Armour · 2026-05-15 17:03

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remov…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-15 00:00

Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation

Rule2DRC introduces a large-scale benchmark for DRC script synthesis with 1,000 rule-to-script tasks and 13,921 evaluation layouts, along with SplitTester which improves program selection through execution-based feedback.

arXiv cs.CV TIER_1 English(EN) · Haohuan Fu · 2026-06-30 16:08

Attend, Transform, or Silence: Operator-Level Visual Skipping for Efficient Multimodal LLM Inference

Multimodal large language models (MLLMs) increasingly process long visual-token sequences, increasing the overall inference computation. Existing acceleration methods usually remove visual tokens or skip visual-token updates in entire layers, but these coarse strategies may disca…

arXiv stat.ML TIER_1 English(EN) · Johannes Zenn, Jonas Geiping · 2026-06-26 04:00

When are likely answers right? On Sequence Probability and Correctness in LLMs

arXiv:2606.27359v1 Announce Type: new Abstract: Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token level or globally at the sequence level. Therefore, thei…

arXiv stat.ML TIER_1 English(EN) · Jonas Geiping · 2026-06-25 17:58

When are likely answers right? On Sequence Probability and Correctness in LLMs

Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token level or globally at the sequence level. Therefore, their success depends on a fundamental question: whe…

LessWrong (AI tag) TIER_1 English(EN) · Josh Engels · 2026-06-22 22:26

LLM-Driven Feature Discovery

<p><span>We would often like to get a qualitative sense of a target model’s behaviors in important distributions (e.g. deployment, RL training, or evals). For example, we might want to </span><a href="https://alignment.anthropic.com/2026/petri-v2/"><span>discover novel behaviors<…

arXiv stat.ML TIER_1 English(EN) · Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez · 2026-06-02 04:00

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

arXiv:2606.00467v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dim…

arXiv stat.ML TIER_1 English(EN) · Jingkai Huang, Will Ma, Zhengyuan Zhou · 2026-06-02 04:00

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

arXiv:2602.05395v2 Announce Type: replace Abstract: A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to sa…

arXiv cs.CV TIER_1 English(EN) · Hyeonwoo Cho, DongHyeon Baek, Yewon Kim, Bumsub Ham · 2026-06-02 04:00

Improving Visual Token Reduction via Rectifying Distortions for Efficient Multimodal LLM Inference

arXiv:2606.01711v1 Announce Type: new Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have achieved remarkable success in vision-language tasks, yet the quadratic computational complexity arising from the vast number of visual tokens incurs significant m…

arXiv stat.ML TIER_1 English(EN) · R. Michael Alvarez · 2026-05-30 01:21

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's fami…

arXiv stat.ML TIER_1 English(EN) · Jiachun Li, David Simchi-Levi, Will Wei Sun · 2026-05-29 04:00

Low Rank for Rank: Uncertainty-Aware Task-Specific LLM Ranking under Sparse Pairwise Comparisons

arXiv:2605.29395v1 Announce Type: cross Abstract: Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ran…

arXiv stat.ML TIER_1 English(EN) · Will Wei Sun · 2026-05-28 05:44

Low Rank for Rank: Uncertainty-Aware Task-Specific LLM Ranking under Sparse Pairwise Comparisons

Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ranking each fine-grained task independently is unsta…

arXiv stat.ML TIER_1 English(EN) · Paula Cordero-Encinar, Georgy Tyukin, Andrew B. Duncan · 2026-05-28 04:00

Soft Specialists: $\alpha$-R\'enyi Ensembles for Uncertainty-Aware LLM Post-Training

arXiv:2605.27747v1 Announce Type: new Abstract: Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is force…

arXiv stat.ML TIER_1 English(EN) · Shijin Gong, Erhan Xu, Kai Ye, Francesco Quinzan, Giulia Livieri, Chengchun Shi · 2026-05-27 04:00

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

arXiv:2605.27293v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency…

arXiv stat.ML TIER_1 English(EN) · Andrew B. Duncan · 2026-05-26 22:44

Soft Specialists: $α$-Rényi Ensembles for Uncertainty-Aware LLM Post-Training

Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent un…

arXiv stat.ML TIER_1 English(EN) · Chengchun Shi · 2026-05-26 17:06

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We intro…

arXiv cs.CV TIER_1 English(EN) · Zehao Wang, Yihan Zeng, Zidong Gong, Yuanfan Guo, Feng Zhu, Hongzhi Zhang, Wei Zhang, Wangmeng Zuo · 2026-05-26 04:00

AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

arXiv:2605.25571v1 Announce Type: new Abstract: Post-training via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is crucial for enhancing reasoning in Multimodal Large Language Models (MLLMs), yet existing paradigms often reach a performance bottleneck due to the li…

arXiv stat.ML TIER_1 English(EN) · Junghyun Lee, Sanghwa Kim, Yassir Jedra, Alexandre Prouti\`ere, Se-Young Yun · 2026-05-25 04:00

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

arXiv:2605.23362v1 Announce Type: cross Abstract: Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can va…

arXiv stat.ML TIER_1 English(EN) · Weijie Su · 2026-05-23 01:18

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining what constitutes an optimal weighting remains poorl…

arXiv stat.ML TIER_1 English(EN) · Se-Young Yun · 2026-05-22 08:26

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can vary substantially. This raises a basic allocation q…

arXiv stat.ML TIER_1 English(EN) · Hamed Khosravi, Xiaoming Huo · 2026-05-21 04:00

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

arXiv:2605.20270v1 Announce Type: cross Abstract: A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $\alpha$. The operator needs a safety …

arXiv stat.ML TIER_1 English(EN) · J. G. Dai, Tianze Deng, Yueying Li, Tianyi Peng · 2026-05-19 04:00

Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

arXiv:2504.07347v3 Announce Type: replace Abstract: As demand for Large Language Models (LLMs) and AI agents grows rapidly, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little has been explored …

arXiv stat.ML TIER_1 English(EN) · Xiaoming Huo · 2026-05-18 22:20

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round…

arXiv stat.ML TIER_1 English(EN) · Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang · 2026-05-18 04:00

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

arXiv:2504.11320v3 Announce Type: replace-cross Abstract: Large language models now serve millions of users daily, with providers incurring costs exceeding $700,000 per day. Each request requires token-by-token inference, making GPU scheduling central to latency, capacity, and co…

雷峰网 (Leiphone) TIER_1 中文(ZH) · 2026-06-29 06:36

ICML 2026 | When Large Models Start Inventing Their Own Languages: How to Make LLMs Complete High-Intensity Reasoning with Fewer Tokens

<h1 style="font-size: 15px; line-height: 1.85; margin: 24px 0 14px; font-weight: 700; color: #111827;">原文作者：公众号“专知”</h1><p>原文链接：<a href="https://mp.weixin.qq.com/s/GYp8zFf-C5pXqHMSDNT2Aw" rel="nofollow" target="_blank">https://mp.weixin.qq.com/s/GYp8zFf-C5pXqHMSDNT2Aw</a> </p><p>…

AWS Machine Learning Blog TIER_1 English(EN) · Sandeep Raveesh-Babu · 2026-05-29 23:36

Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality

This post demonstrates a comprehensive observability solution using Amazon Managed Grafana dashboards that provides a holistic view of both quality and quantity for LLMs served on Amazon SageMaker AI endpoints with inference components.

Databricks Blog TIER_1 English(EN) · 2026-05-27 20:20

Reliable LLM Inference at Scale

At Databricks, we’ve built a unique inference platform that serves every frontier...

Databricks Blog TIER_1 English(EN) · 2026-05-22 20:00

Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks

Why Prompt Caching MattersLarge language model (LLM) inference often involves repeated...

Together AI blog TIER_1 English(EN) · 2026-04-03 00:00

AI for Systems: Using LLMs to Optimize Database Query Execution

New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.

Together AI blog TIER_1 English(EN) · 2025-06-11 00:00

Introducing the Together AI Batch API: Process Thousands of LLM Requests at 50% Lower Cost

Together AI blog TIER_1 English(EN) · 2025-05-28 00:00

Mixture-of-Agents Alignment: Harnessing the Collective Intelligence of Open-Source LLMs to Improve Post-Training

Anyscale blog TIER_1 English(EN) · 2026-06-18 09:00

High Performance Distributed Inference with Ray Serve LLM

Learn how Ray Serve LLM + vLLM stack achieves up to 24x higher throughput with direct streaming, HAProxy integration, and a new vLLM Ray executor backend.

Hacker News — AI stories ≥50 points TIER_1 English(EN) · AMavorParker · 2026-05-20 21:11

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Medium — fine-tuning tag TIER_1 한국어(KO) · YouShin kim · 2026-07-04 00:57

UCLA & Optum AI — Innovation in LLM Inference Model Training: How to Select High-Quality Data by Looking Only at the 'First 100 Tokens'

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@mdpman/ucla-optum-ai-llm-%EC%B6%94%EB%A1%A0-%EB%AA%A8%EB%8D%B8-%ED%95%99%EC%8A%B5%EC%9D%98-%ED%98%81%EC%8B%A0-%EC%B2%AB-100%ED%86%A0%ED%81%B0%EB%A7%8C-%EB%B3%B4%EA%B3%A0-%EA%B3%A0%ED%92%88%EC%…

Towards AI TIER_1 English(EN) · Suchitra Malimbada · 2026-07-02 14:31

Why 4-Bit Weights Are Easy and 8-Bit Activations Break Models: Inside LLM Inference, Part 3

<h4><em>A systems-level mental model of quantization, built from the asymmetry that explains every method in the field</em></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kywVQlvTSCtdxy9PH3N6RQ.jpeg" /></figure><p>Quantizing the weights of a large languag…

Medium — MLOps tag TIER_1 English(EN) · Hatemazaiez · 2026-07-02 10:43

I Measured How Inference Concurrency Silently Degrades LLM Reasoning Quality

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@hatemazaiez1/i-measured-how-inference-concurrency-silently-degrades-llm-reasoning-quality-9074189fce5e?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1422/1*TvAiMq5CDCH…

Towards AI TIER_1 English(EN) · Dylan Tartarini · 2026-06-29 17:31

Build and Query Knowledge Graphs with LLMs

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://pub.towardsai.net/build-and-query-knowledge-graphs-with-llms-4f39251df792?source=rss----98111c9905da---4"><img src="https://cdn-images-1.medium.com/max/961/0*1_mBtGUfumtQ-LYE.png" width="961" /></a></p><p…

Towards AI TIER_1 English(EN) · Artha Mukherjee · 2026-06-23 22:01

A GPU-Poor’s Guide to Local LLM Inference in 2026

<h4>MoE math. KV cache quants below q8_0. MCP-based tooling. Worked example: a 35B Mixture-of-Experts on 6 GB of VRAM.</h4><figure><img alt="The 6 GB laptop running, with terminal + chat UI visible" src="https://cdn-images-1.medium.com/max/1024/1*ipxoUODrjtFTxvdbq_KGdQ.gif" /></f…

Medium — MLOps tag TIER_1 English(EN) · Sami · 2026-06-21 14:25

One Number Lies: How to Actually Measure LLM Inference

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@ing.benali.sami/one-number-lies-how-to-actually-measure-llm-inference-0b78e6572a33?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/999/1*xp-ENMt4ONNA0uGwJfe5zA.png" widt…

Medium — fine-tuning tag TIER_1 English(EN) · Jose Miguel Arrieta · 2026-06-20 13:38

LoRA Notes: Fine-Tuning Large Models with Fewer Parameters

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/data-science-hub/lora-notes-fine-tuning-large-models-with-fewer-parameters-756cafd5662a?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1081/1*zfkCTvNdlVksGksN7dOdn…

Medium — MLOps tag TIER_1 English(EN) · Michiel Horstman · 2026-06-19 23:10

Model Merging for Dummies: Combine LLMs Without Training

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://michielh.medium.com/model-merging-for-dummies-combine-llms-without-training-7d7173c069bc?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1536/1*GMGyvxframz4wfkIQTOjSw.png" width="15…

Towards AI TIER_1 English(EN) · ChienLoong · 2026-06-16 12:31

The Inference Reckoning: How to Stop Burning Millions on Cloud LLM Tokens

<h4>Imagine checking your enterprise cloud billing dashboard on a Monday morning and seeing a sudden, violent $45,000 spike.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XkxO-XoTC2kSonJPHveTrA.png" /><figcaption>Source from Author</figcaption></figure><…

Medium — MLOps tag TIER_1 English(EN) · The_Turingetic_Guy · 2026-06-15 17:27

Large-Scale Distributed LLM Inference — Part 3 : Inference Metrics, Scheduling Strategies, and…

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@the_turingetic_guy/large-scale-distributed-llm-inference-part-3-inference-metrics-scheduling-strategies-and-f115e8933b48?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/…

dev.to — MCP tag TIER_1 English(EN) · Gurutva Murdia · 2026-06-11 18:24

Introducing Duplex: A Zero-Backend, Multiplexed LLM Inference Engine for True Client-Side Parallel AI

<p>Hi there. I’m Gurutva Murdia, the developer behind Duplex. Today I’m excited to share the story, architecture, and technical deep dives of a project that’s been consuming my focus for months: a fully decentralised , browser-native wrapper that lets you run multiple Large Langu…

Towards AI TIER_1 English(EN) · Abhinandan Malhotra · 2026-06-10 14:31

Optimizing Local LLM Inference on Constrained Hardware

<h4>An engineering deep dive into KV cache quantization, asymmetric thread tuning, and PCIe bottlenecks</h4><h3><strong>Introduction</strong></h3><p>New frontier models launch weekly, and for most developers, the testing phase abruptly ends when the API bill arrives or the rate l…

Medium — MLOps tag TIER_1 English(EN) · Rayari · 2026-06-08 21:15

The Black Box Nobody Talks About: A Deep Dive into LLM Inference Engineering

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rayari1729/the-black-box-nobody-talks-about-a-deep-dive-into-llm-inference-engineering-e71dd94f4624?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/2600/1*hNWi-AW7w6OqP2…

Medium — MLOps tag TIER_1 English(EN) · jagesh maharjan · 2026-06-07 15:56

LLM Training: The 5D Parallelism Universe

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@JugsMa/llm-training-the-5d-parallelism-universe-ff0045b20bd4?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/2600/1*o6tQyOQIPeW6SoiMsefFLw.png" width="2816" /></a></p><p…

Medium — MLOps tag TIER_1 English(EN) · Avishek Jana · 2026-06-04 03:18

Understanding LLM Precision — How Bit Formats Shape Training, Inference, and Quality

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://blog.geogo.in/understanding-llm-precision-how-bit-formats-shape-training-inference-and-quality-1cd0550bd717?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/2486/1*QKMWDFTC5jcwe07NPc…

Towards AI TIER_1 English(EN) · Shakti Wadekar · 2026-05-30 05:14

The Evolution of LLM Inference: Decoding algorithms — Part 1

<p>LLM inference optimization can be understood along three major axes: <strong>memory optimization, compute optimization, and decoding algorithms</strong>. Compared to memory and compute optimizations, decoding algorithms are often discussed less, even though they are becoming i…

Medium — MLOps tag TIER_1 English(EN) · The_Turingetic_Guy · 2026-05-24 15:53

Large-Scale Distributed LLM Inference — Part 1

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rtxtdfs/large-scale-distributed-llm-inference-part-1-54343375c2c4?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/798/1*H-GnzHY45Yo7AnuLCpspfw.png" width="798" /></a></p…

Medium — fine-tuning tag TIER_1 English(EN) · Boring Developer · 2026-05-23 11:26

Fine-Tuning LLM: Building Personality of AI

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@parthbissa5/fine-tuning-llm-building-personality-of-ai-fa74b8a40c0d?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/600/1*E-guVNJTOIosxYAi2SPstw.jpeg" width="600" …

Medium — fine-tuning tag TIER_1 English(EN) · QuarkAndCode · 2026-05-21 07:48

Fine-Tuning and Alignment: How Domain Adaptation Builds Specialized LLMs

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@QuarkAndCode/fine-tuning-and-alignment-how-domain-adaptation-builds-specialized-llms-7c6d93f66937?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1024/1*D2kcjRNI5S…

Medium — fine-tuning tag TIER_1 English(EN) · QuarkAndCode · 2026-05-18 08:26

Why Pretrained LLMs Need Fine-Tuning for Better AI Performance

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@QuarkAndCode/why-pretrained-llms-need-fine-tuning-for-better-ai-performance-6541293f9fef?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1024/1*y3FRj0ALAXfwrMOzXPZ…

Medium — MLOps tag TIER_1 English(EN) · Charan Panthangi · 2026-05-18 04:38

Inference Optimization — How to Make LLMs Faster and Cheaper in Production

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@charan.panthangi/inference-optimization-how-to-make-llms-faster-and-cheaper-in-production-2778cd00d921?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1200/1*tyCL0_ikRhY…

dev.to — LLM tag TIER_1 English(EN) · Mudassir Khan · 2026-07-01 19:54

LLM Cost Optimization: Cutting Inference Bills Without Killing Quality

<p>You can cut your LLM API spend by 50 to 90% without switching models or degrading output quality. The techniques exist, the docs are public, and most teams are not using them. Here is what actually moves the needle.</p> <h2> Where your LLM bill actually comes from </h2> <p>Eve…

dev.to — LLM tag TIER_1 English(EN) · Klinsmann R · 2026-07-01 09:51

Understanding How LLMs Work: From Text to Tokens, Embeddings, Transformers, and Predictions

<p>Artificial Intelligence is nothing new. It has been around since the early days of computing and has slowly evolved over time. But today, where we stand with Generative AI, or GenAI, it has become one of the most popular and widely adopted categories of advanced AI.<br /> At t…

dev.to — LLM tag TIER_1 English(EN) · Vladyslav Donchenko · 2026-06-29 07:05

"LLM Inference Optimization: The Line Item That Decides If Your AI Ships"

<p>Training gets the headlines. Inference gets the bill. If you run LLMs in production, inference is almost certainly your biggest AI line item — a meter running 24/7 on every request. The gap between naive and optimized serving is routinely <strong>5-10x in cost and 3-5x in late…

dev.to — LLM tag TIER_1 English(EN) · Etrit Neziri · 2026-06-28 21:04

LLM Function Calling: The Complete Guide for Building AI Tools

<h1> LLM Function Calling: The Complete Guide for Building AI Tools </h1> <p>Function calling (tool use) is the technology that turned LLMs from chatbots into agents. Here's the complete guide.</p> <h2> What Is Function Calling? </h2> <p>Function calling lets an LLM <strong>decid…

dev.to — LLM tag TIER_1 English(EN) · devtocash · 2026-06-27 20:21

Kubernetes LLM Inference: Deploy and Scale Open-Source LLMs in 2026

<p>Running your own LLMs on Kubernetes isn't just a cost play — it's about latency, data sovereignty, and fine-tuning control. But GPU scheduling at scale is a different beast entirely.</p> <p>Here's what a production K8s LLM inference stack looks like in 2026: vLLM or TGI for th…

dev.to — LLM tag TIER_1 English(EN) · pueding · 2026-06-27 11:21

OpenAI and Broadcom's Jalapeño, a Custom Inference ASIC: Inference ASIC vs GPU

<p> </p> <p><strong>What:</strong> The <strong>OpenAI and Broadcom Jalapeño announcement</strong> (June 24, 2026) is OpenAI's <strong>first custom LLM-inference ASIC</strong> — a reticle-sized compute chiplet paired with HBM, built to <strong>run</strong> models rather than train…

dev.to — LLM tag TIER_1 English(EN) · Lycore Development · 2026-06-27 10:19

Structured Outputs: How We Stopped Parsing LLM Responses by Hand

<p>Every team we talk to has a version of the same story. They built an LLM integration that works well in testing. Then, three weeks into production, something comes back slightly different — the model wraps the JSON in a code block, or uses <code>"status": "Completed"</code> in…

dev.to — LLM tag TIER_1 English(EN) · arya · 2026-06-27 00:08

What building an LLM inference engine from scratch taught me about compiler design

<p>the insight that started this project hit me while i was finishing a bytecode-compiled language i'd written in C</p> <p>i'd spent months building a hand-written lexer, a single-pass Pratt compiler, a stack VM with 35 opcodes, and a mark-and-sweep garbage collector. and right n…

dev.to — LLM tag TIER_1 English(EN) · Kuldeep Paul · 2026-06-26 18:42

A Guide to the Best Semantic Caching Tools for LLMs in 2026

<p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Foxxdppg4ygvpqbzrm0i4.png"><img alt="A Guide to the B…

dev.to — LLM tag TIER_1 English(EN) · Eric-Octavian · 2026-06-25 17:55

Training LLMs in the kernel — how IONA AI does embedding, RAG, and fine‑tuning without the cloud

<p>Most AI systems today are cloud‑based. You send a prompt to an API, and a model somewhere else generates a response. You don't control the model. You don't control the data. You don't control the infrastructure.</p> <p>IONA AI is the opposite.</p> <p>It runs inside the kernel …

dev.to — LLM tag TIER_1 English(EN) · zeromathai · 2026-06-25 14:15

Why KV Cache Matters — How MQA, GQA, and MLA Make LLM Inference Faster

<p>LLMs generate text one token at a time.</p> <p>That sounds simple.</p> <p>But without KV Cache, every new token would repeat a lot of old work.</p> <p>That is why inference optimization starts with keys and values.</p> <h2> Core Idea </h2> <p>KV Cache stores previously compute…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-25 12:18

Building Multilingual AI: LLM Dataset Best Practices

<p>Artificial intelligence has transformed the way businesses communicate, automate processes, and provide personalized customer experiences. As businesses grow to global markets, AI systems need to understand and produce content in many languages while maintaining cultural and r…

dev.to — LLM tag TIER_1 English(EN) · ironbyte-rgb · 2026-06-24 19:00

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

<h2> TL;DR </h2> <ul> <li>Real-time LLM inference on standard GPUs can reach 3k tokens/s per request</li> <li>Optimizing the whole software stack with architecture/engine/kernel co-design is crucial for fast inference</li> <li>Standard datacenter GPU hardware has a higher decodin…

r/LocalLLaMA TIER_1 English(EN) · /u/z_latent · 2026-06-24 14:22

OpenAI and Broadcom unveil LLM-optimized inference chip

<div class="md"><p><a href="https://openai.com/index/openai-broadcom-jalapeno-inference-chip/">https://openai.com/index/openai-broadcom-jalapeno-inference-chip/</a></p> <p>Quoted from the start of the blog post:</p> <ul> <li>Early testing shows that the first-gener…

dev.to — LLM tag TIER_1 English(EN) · Ashwin Giridharan · 2026-06-24 06:36

I built an interactive 11-chapter guide to how LLM inference actually works

<p>Production vLLM is 100,000+ lines of C++, CUDA, and Python. It powers most of the industry's LLM serving — but reading it cold is brutal.</p> <p>So I built a study series around <strong>nano-vLLM</strong>, an open-source reimplementation of vLLM's core ideas in ~1,200 lines of…

dev.to — LLM tag TIER_1 English(EN) · Manoj Krishna Mohan · 2026-06-23 05:43

I built a Rust entropy monitor to route LLM inference — here's what the benchmark showed

<p>Frontier LLM inference is expensive. I wanted to see how far a 4B local model could go before needing a cloud call — and when the cloud call actually adds value.</p> <p>The result is Buddy System: a tiered inference architecture where a Rust entropy monitor watches per-token u…

dev.to — LLM tag TIER_1 English(EN) · Zhongkai Fu · 2026-06-22 17:09

TensorSharp: .NET Native Open Source Local LLM Inference Engine

<p><a href="https://github.com/zhongkaifu/TensorSharp" rel="noopener noreferrer">TensorSharp</a><br /> I would like to share my latest open source .net native local LLM inference engine and applications. It supports many models, like Gemma4, DiffusionGemma, Qwen3.6 with multi-mod…

r/LocalLLaMA TIER_1 English(EN) · /u/carteakey · 2026-06-21 23:01

Local LLM Inference Optimization: The Complete Guide

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1uc3wg9/local_llm_inference_optimization_the_complete/"> <img alt="Local LLM Inference Optimization: The Complete Guide" src="https://external-preview.redd.it/s3zETEijR5VlGEv8jnAYlpIUtOJGtoxTXyjh8AaO6a0.png?wi…

r/MachineLearning TIER_1 English(EN) · /u/YouFirst295 · 2026-06-20 12:27

An open handbook on LLM inference at scale (GPU internals, KV cache, batching, vLLM/SGLang/TensorRT-LLM) [P]

<div class="md"><p>I've been working through the internals of LLM inference and writing up what I learn as an open, in-progress handbook.</p> <p>Just wrapped another chapter on GPU execution and memory internals: why a GPU sits mostly idle during inference, how the…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-15 11:55

Mastering AI Performance Through Advanced LLM Dataset Strategies

<p>Artificial intelligence is changing the way businesses operate, innovate, and engage customers. From intelligent virtual assistants to content generation tools, predictive analytics, and enterprise automation, AI has become a catalyst for digital transformation. These developm…

dev.to — LLM tag TIER_1 English(EN) · HelperX · 2026-06-15 05:21

LLM Cost Optimization: How We Cut Reply Generation from $0.011 to $0.0009

<p>When we shipped the first version of AI-generated replies for <a href="https://helperx.app" rel="noopener noreferrer">HelperX</a>, each reply cost us about $0.011 in API spend. That sounds tiny until you multiply by 30 replies per slot per day times 200 active slots: roughly $…

dev.to — LLM tag TIER_1 English(EN) · Nolan Vale · 2026-06-12 17:33

Token Cost Optimization: How to Cut LLM Inference Spend Without Cutting Quality

<p>There is a version of token cost optimization that I do not recommend: cutting token counts by reducing the quality of your system prompt, your retrieved context, or your response formatting. This approach reduces cost and reduces quality in equal measure. You have not optimiz…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-12 12:22

The Role of High-Fidelity LLM Training Datasets in Modern Machine Learning

<p>Large Language Models (LLMs) have revolutionized artificial intelligence by enabling machines to seamlessly generate text, answer complex queries, and translate languages; however, the true catalyst behind these capabilities is high-fidelity training data. As organizations rap…

dev.to — LLM tag TIER_1 English(EN) · BAOFUFAN · 2026-06-10 12:06

Pitfalls of Testing LLM Long-Term Memory: A 3‑Day Debugging Saga

<p>I was jolted awake at 2 a.m. by a PagerDuty alert — users were complaining that the AI “called me Mr. Wang yesterday, but today it doesn’t recognise me at all.” Groggily I pulled up the monitoring dashboards and saw that the vector database’s retrieval latency had spiked, and …

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-10 11:34

Scaling Generative AI: Best Practices for LLM Dataset Curation and Annotation

<p>Generative AI has revolutionized industries by allowing machines to generate human-like text, images, audio, and code. Any successful Large Language Model (LLM) relies on high-quality data as its bedrock. As organizations accelerate their AI initiatives, effective dataset cura…

dev.to — LLM tag TIER_1 English(EN) · Manoranjan Rajguru · 2026-06-10 05:10

How Xiaomi Cracked 1,000 Tokens/Second on a 1-Trillion Parameter Model: A Deep Dive into LLM Inference Optimization

<blockquote> <p><strong>Meta Description:</strong> Xiaomi's MiMo-V2.5-Pro-UltraSpeed just shattered the 1,000 tokens/second barrier on a 1T-parameter model using commodity GPUs. This deep dive unpacks the FP4 quantization, DFlash speculative decoding, and TileRT persistent engine…

dev.to — LLM tag TIER_1 English(EN) · Kotcherla Murali Krishna · 2026-06-09 02:26

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

<p>A deep dive into memory fragmentation, paged memory management, and why PagedAttention can deliver up to 24× higher throughput than conventional KV cache implementations.</p> <p>Every token you generate during LLM inference silently eats GPU memory. With traditional KV caching…

dev.to — LLM tag TIER_1 Norsk(NO) · ItsEvilDuck · 2026-06-08 19:28

Fast LLM Token Counter: Estimate Tokens for GPT Models

<p>Today I'm sharing a new utility, the Fast LLM Token Counter. This tool is built to provide quick token count estimations for any given text input.</p> <p>It uses OpenAI's <code>tiktoken</code> library, which is the same method OpenAI uses. This allows for accurate predictions …

dev.to — LLM tag TIER_1 English(EN) · Abhinav Tripathi · 2026-06-08 12:38

Learning LLMs by training a 1B param model from scratch on Strix Halo

<p>About 1 year ago, AMD released their <a href="https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-plus-395.html" rel="noopener noreferrer">AI Max+ series CPUs</a> (aka <code>Strix Halo</code>). It seemed that all of my youtube feed was filled…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-08 12:23

The Hidden Power Behind Generative AI: LLM Training Datasets

<p>Generative AI has transformed the way we create content, automate workflows, and interact with technology. From writing articles and generating code to creating realistic images and answering complex questions, Large Language Models (LLMs) are powering a new era of artificial …

dev.to — LLM tag TIER_1 English(EN) · soy · 2026-06-07 21:33

New `llama.cpp` Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference

<h2> New <code>llama.cpp</code> Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference </h2> <h3> Today's Highlights </h3> <p>Today's top stories highlight advancements in efficient local AI, starting with core <code>llama.cpp</code> updates for faster LLM…

r/LocalLLaMA TIER_1 English(EN) · /u/LMTLS5 · 2026-06-06 18:12

Made inference focused library for Zero Order optimization of LLMs. built on GGML. 39x faster forward pass and 15x faster on one MeZo step. [P]

<div class="md"><p></p> <p>i felt like zero order optimization in pytotch was needlessly slow and tough. i am working on zero order optimization so i built this. mostly vibe coded but design choises were mine and yes i read every single line of code before …

dev.to — LLM tag TIER_1 한국어(KO) · HyunSeok Jeong · 2026-06-06 04:44

100 Ad Copies with LLM + CTR Prediction - A 4-Step Workflow for Operators

<blockquote> <p>"이번 캠페인 카피 30개만 더 뽑아주세요" — 마케터의 단골 주문이었던 이 한 줄이, GPT/Claude 등장 이후 의미가 달라졌어요. 이제 100개도 5분이면 나옵니다. 그런데 정작 광고 매니저에 100개를 다 태우면 학습 분산이 깨지고, 비슷한 카피끼리 서로 잠식해서 결과가 망가져요. 이 글은 LLM으로 양산한 카피를 <strong>중복 제거 → 사전 스코어링 → A/B 후보 선별</strong>까지 가는 운영자용 4단계 파이프라인입니다.</p> </blockqu…

r/LocalLLaMA TIER_1 English(EN) · /u/Sisuuu · 2026-06-04 15:02

Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context

<div class="md"><h1>written 20%-ish by me and 80% by Claude code</h1> <p>Spent basically a whole day getting my box to run Qwen3.6-27B as one OpenAI-compatible endpoint that hot-swaps between four quant/backend combos (llama.cpp Q6_K and Q8_0, vLLM INT4 and INT8). …

dev.to — LLM tag TIER_1 English(EN) · soy · 2026-06-02 21:33

Local LLM Advances: Holo3.1 Agents, Headroom Token Compression & Open-LLM-VTuber for Local Inference

<h2> Local LLM Advances: Holo3.1 Agents, Headroom Token Compression & Open-LLM-VTuber for Local Inference </h2> <h3> Today's Highlights </h3> <p>This week's top stories highlight practical tools and techniques for enhancing local LLM performance and deployment, from efficient…

dev.to — LLM tag TIER_1 English(EN) · No One · 2026-06-02 19:00

Request-Based vs Token Pricing for LLM Inference in 2026

<p>By 2026, the default assumption for LLM inference pricing is still token-based billing. You count input tokens, output tokens, and occasionally tokens spilled across tool calls or retrieval context. For short prompts this feels manageable, but as context windows stretch into t…

r/LocalLLaMA TIER_1 English(EN) · /u/yogthos · 2026-06-02 17:28

Putting Code Under a Microscope: Wavelet-Based Context for LLMs

  submitted by   <a href="https://www.reddit.com/user/yogthos"> /u/yogthos </a> <br /> <span><a href="https://yogthos.net/posts/2026-06-02-wavescope.html">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1tuxwhs/putting_code_under_a_micr…

dev.to — LLM tag TIER_1 Русский(RU) · Promptra Team · 2026-06-01 19:17

Comparison of the Top 5 LLM Models of 2026: Price, Benchmarks, Real-World Application

<p>Если в 2024 году рынок LLM-API ещё можно было назвать «дуополией OpenAI + Anthropic с догоняющим Google», то к маю 2026 ландшафт расщепился на четыре чёткие лиги: премиум-reasoning (Claude Opus 4.7, GPT-5.5), value-tier с длинным контекстом (Claude Sonnet 4.6, Gemini 3 Pro), a…

dev.to — LLM tag TIER_1 (CA) · TildAlice · 2026-06-01 15:04

LLM Tokenization: GPT vs Claude vs Llama Edge Cases

<h2> The 🤗 Emoji Cost Me $47 in API Calls </h2> <p>I ran a batch job that sent 10,000 user-generated messages to GPT-4. The average message was about 200 characters. I budgeted for ~50 tokens per message based on the "~4 characters per token" rule everyone quotes.</p> <p>Actual c…

dev.to — LLM tag TIER_1 English(EN) · Becomer.net · 2026-06-01 14:35

How I built a zero-token memory layer for LLMs (and why it outperforms vector store approaches)

<p>If you've built an AI chatbot or agent, you've hit the same problem: the LLM forgets everything between sessions. The standard solution is to stuff your conversation history into a vector store and retrieve relevant chunks before each call. It works — but it has a hidden cost.…

dev.to — LLM tag TIER_1 English(EN) · Samir Yuja · 2026-06-01 13:17

Futbol Report — building a multi-model LLM comparison on AWS Lambda

<p><em>Originally posted at <a href="https://samiryuja.dev/blog/futbol-report-multi-model-eval" rel="noopener noreferrer">samiryuja.dev</a>.</em></p> <p>A few months ago I set up a soccer-digest bot that sends me a Telegram message every few days with fixtures, results, transfer …

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-01 10:45

Powering Next-Generation AI with High-Quality LLM Datasets

<p><strong>Introduction</strong><br /> Artificial Intelligence (AI) is rapidly transforming industries by enabling machines to understand, process, and generate human-like language. At the heart of this transformation are Large Language Models (LLMs), which power applications suc…

r/LocalLLaMA TIER_1 English(EN) · /u/Thrumpwart · 2026-05-31 23:14

Semantic Step Prediction: Multi-Step Latent Forecasting in LLM Reasoning Trajectories via Step Sampling

  submitted by   <a href="https://www.reddit.com/user/Thrumpwart"> /u/Thrumpwart </a> <br /> <span><a href="https://arxiv.org/abs/2604.18464">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1ttalm9/semantic_step_prediction_multistep_lat…

dev.to — LLM tag TIER_1 English(EN) · Ayi NEDJIMI · 2026-05-30 10:03

Pydantic AI vs LangChain vs instructor: structured LLM outputs compared

<p>Getting structured data out of a language model reliably is harder than it looks. The model might return JSON that's almost valid, skip required fields, or wrap the object in a markdown block. Three Python libraries try to solve this differently: <strong>instructor</strong>, <…

dev.to — LLM tag TIER_1 Français(FR) · Paul SANTUS · 2026-05-29 12:41

Generating structured data with an LLM: a few tips for greater reliability

<p>Les LLMs sont excellents pour générer du texte. Ils sont mauvais pour générer des données structurées de manière fiable. Si vous avez déjà essayé de faire produire à un agent un objet JSON avec un schéma précis, vous connaissez le douloureux résultat : champs manquants, clés h…

r/MachineLearning TIER_1 English(EN) · /u/averne_ · 2026-05-29 08:54

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

<div class="md"><p>We built a monokernel that runs the full decode sequence as one GPU-resident program on AMD MI300X, with some neat optimizations. The die topology is central to the result, we map memory access patterns to the physical layout, compute units group…

dev.to — LLM tag TIER_1 English(EN) · synthorai · 2026-05-27 15:30

LLM Prompt Caching: The Complete 2026 Guide

<p>If you ship a chatbot, a RAG app, or an AI agent against a large language model, prompt caching is the single optimization that gives you back <strong>50–90% of input cost and 3–10× of time-to-first-token</strong> at no quality cost. It isn't a bolt-on trick — it falls directl…

Mastodon — fosstodon.org TIER_1 Русский(RU) · [email protected] · 2026-05-27 07:22

[Translation] Scaling LLMs: From a Single Chip to the Data Center. Chapter 3. Transformers This is a continuation of the series of articles on scaling LLM training and inference. Previous

[Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 3. Траснформеры Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая статья А теперь перейдем к чему-то более практическому, а именно к тому, сколько нужно FLOPs и байт для работы трансф…

LINKS habr.com/…/1039208

dev.to — LLM tag TIER_1 English(EN) · Quratulain Nayeem · 2026-05-26 16:46

Beyond Prompting: Building a 4-Stage LLM Compiler with Surgical Self-Repair

<p>A single prompt often yields inconsistent, unvalidated AI output. To fix this, I built <strong>Compyl</strong> a multi-stage LLM compiler that inputs english words converting them into directly usable JSON blueprint. </p> <p>Compyl converts plain English into a complete, valid…

dev.to — LLM tag TIER_1 English(EN) · pixelbank dev · 2026-05-25 23:10

Applications of LLMs — Deep Dive + Problem: Information Gain

<p><em>A daily deep dive into llm topics, coding problems, and platform features from <a href="https://pixelbank.dev" rel="noopener noreferrer">PixelBank</a>.</em></p> <h2> Topic Deep Dive: Applications of LLMs </h2> <p><em>From the Introduction to LLMs chapter</em></p> <h2> Intr…

dev.to — LLM tag TIER_1 English(EN) · David Moores · 2026-05-25 18:33

Benchmarking LLM Structured Outputs

<blockquote> <p>Cross-posted from <a href="https://carrick.tools/blog/benchmarking-llm-structured-outputs/" rel="noopener noreferrer">carrick.tools</a>.</p> </blockquote> <p>When you read the API documentation for OpenAI, Anthropic, or Google Gemini, the feature called "structure…

dev.to — LLM tag TIER_1 English(EN) · Mustafa ERBAY · 2026-05-22 16:11

LLM Inference Caching: How to Balance Cost and Latency?

<h2> Introduction to LLM Inference Caching: Why It Matters? </h2> <p>When working with Large Language Models (LLMs), especially as you start using them in production environments, one of the first major challenges you'll face is the delicate balance between cost and latency. LLMs…

dev.to — LLM tag TIER_1 English(EN) · Nishkarsh Sahu · 2026-05-19 18:33

Building a Rails-Native AI Abstraction Layer for Local and Hosted LLMs

<p>Recently I’ve been experimenting with integrating local AI runtimes into Rails applications using tools like Ollama and LM Studio.</p> <p>At first, the integration looked straightforward:<br /> make an HTTP request, stream the response, and return the generated text.</p> <p>Bu…

dev.to — LLM tag TIER_1 English(EN) · Kotcherla Murali Krishna · 2026-05-19 17:43

Modular LLM Inference Engine from Scratch

<p>Why vLLM, TensorRT-LLM, and llama.cpp each solve only part of the problem — and how I built inferx to fill the gap. Runs on any laptop, no GPU needed.</p> <p>I spent the last few months building inferx — an open-source LLM inference optimization library that runs on any machin…

Mastodon — mastodon.social TIER_1 Deutsch(DE) · aisyndicate · 2026-06-03 14:30

LLM Inference, Quantization, and Local AI: Where Quality is Truly Lost Quantization Often Seems Harmless, but Flips Show: Same Accuracy Can Mean Different Things

LLM-Inferenz, Quantisierung und lokale KI: Wo Qualität wirklich verloren geht Quantisierung wirkt oft harmlos, doch Flips zeigen: Gleiche Accuracy kann anderes Verhalten verdecken. Für lokale KI zählt Drift mehr als Benchmarks. https:// aisyndicate.ch/llm-inferenz-qu antisierung-…

LINKS aisyndicate.ch/llm-inferenz-quantisierung…

Mastodon — mastodon.social TIER_1 Deutsch(DE) · [email protected] · 2026-05-30 16:04

RT @PavloMolchanov: 🚀 Self-speculation enables a 6.75x real-time acceleration of LLM generation with SGLang inference! More on Arint.info # AI # Di

RT @PavloMolchanov: 🚀 Selbst-Spekulation ermöglicht eine 6,75-fache echte Beschleunigung der LLM-Generierung mit SGLang-Inference! mehr auf Arint.info # AI # Diffusion # LLM # MachineLearning # Nemotron # SGLang # arint_info https://x.com/PavloMolchanov/status/2060245957254824246…

Mastodon — mastodon.social TIER_1 Deutsch(DE) · [email protected] · 2026-05-29 16:03

RT @PavloMolchanov: 🚀 Self-speculation enables a 6.75x real-world acceleration of LLM generation with SGLang inference! more at Arint.info # AI # Dee

RT @PavloMolchanov: 🚀 Selbst-Spekulation ermöglicht eine 6,75-fache reale Beschleunigung der LLM-Generierung mit SGLang-Inferenz! mehr auf Arint.info # AI # DeepLearning # Innovation # LLM # MachineLearning # NLP # arint_info https://x.com/PavloMolchanov/status/206024595725482424…

Mastodon — mastodon.social TIER_1 Русский(RU) · [email protected] · 2026-05-25 07:22

LLM Scaling: From a Single Chip to the Data Center. Chapter 2. Sharding This is a continuation of the series of articles on scaling LLM training and inference. Previous chapter

[Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 2. Шардинг Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая глава находится по этой ссылке . Итак, с основами разобрались, давайте теперь разбираться с тем, как распихать матрицы по …

LINKS habr.com/…/1037918

r/singularity TIER_2 English(EN) · /u/yogthos · 2026-06-24 19:07

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

<table> <tr><td> <a href="https://www.reddit.com/r/singularity/comments/1uemrgy/dualpath_breaking_the_storage_bandwidth/"> <img alt="DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference" src="https://external-preview.redd.it/q3evP6JeDpAC2MdSQHWYxnCYTqbJkEl…

r/singularity TIER_2 English(EN) · /u/Distinct-Question-16 · 2026-06-24 14:07

OpenAI and Broadcom unveil LLM-optimized inference chip

<div class="md"><p>“We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hard…

COVERAGE [564]