LLM 推理和推理技术随着新研究和硬件的进步而发展

OpenAI News TIER_1 English(EN) · 2026-06-24 06:00

OpenAI与博通发布LLM优化推理芯片

OpenAI and Broadcom introduce Jalapeño, a custom AI chip built for LLM inference to improve performance, efficiency, and scale across AI systems.

Google AI / Research TIER_1 English(EN) · 2026-03-04 20:29

教大型语言模型像贝叶斯主义者一样推理

Generative AI

Google AI / Research TIER_1 English(EN) · 2025-09-11 22:01

投机级联——一种更智能、更快速的LLM推理混合方法

Generative AI

arXiv cs.CL TIER_1 English(EN) · \'Ad\'am Kov\'acs, Nadia Verdha, G\'abor Recski · 2026-07-03 04:00

RuleChef: Grounding LLM Task Knowledge in Human-Editable Rules

arXiv:2607.01293v1 Announce Type: new Abstract: We present RuleChef, a framework that uses large language models (LLMs) to generate executable rules for NLP tasks such as text classification, Named Entity Recognition (NER), or relation extraction. Rules are generated based on a t…

arXiv cs.AI TIER_1 English(EN) · Tingting Yu, Pei-Cing Huang, Chan Hsu, Chan-Tung Ku, Yihuang Kang · 2026-07-03 04:00

ADVENT: LLM-Driven Automatic Predicate Invention for ILP

arXiv:2607.01585v1 Announce Type: cross Abstract: Predicate invention (PI), the creation of new predicates to extend the hypothesis space, remains a critical bottleneck in Inductive Logic Programming (ILP). Existing methods rely on domain expertise and produce semantically opaque…

arXiv cs.AI TIER_1 English(EN) · Samir Abdaljalil, Erchin Serpedin, Hasan Kurban · 2026-07-03 04:00

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

arXiv:2607.01431v1 Announce Type: cross Abstract: We introduce ISOSCI, a benchmark of isomorphic cross-domain science problem pairs that separates reasoning ability from domain knowledge retrieval in LLM evaluation. Each pair shares identical logical structure but requires differ…

arXiv cs.AI TIER_1 English(EN) · Yanjun Zhao, Ruizhong Qiu, Tianxin Wei, Yuanchen Bei, Zhining Liu, Lingjie Chen, Ismini Lourentzou, Hanghang Tong, Jingrui He · 2026-07-03 04:00

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

arXiv:2607.02509v1 Announce Type: new Abstract: Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-07-02 17:59

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in th…

arXiv cs.AI TIER_1 English(EN) · Jingrui He · 2026-07-02 17:59

ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning

Understanding and reasoning over long contexts has become a key requirement for deploying large language models (LLMs) in realistic applications. Although recent LLMs support increasingly long context windows, they often fail to use relevant evidence that is already present in th…

arXiv cs.AI TIER_1 English(EN) · Woosung Koh, Juyoung Suk, Sungjun Han, Se-Young Yun, Jamin Shin · 2026-07-02 04:00

使用小型代理模型预测大型语言模型推理性能

arXiv:2509.21013v4 Announce Type: replace-cross Abstract: Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize datasets before scaling up. However, this approach becomes challenging for reasoning capabiliti…

arXiv cs.AI TIER_1 English(EN) · Fatima Jahara, Mark Dredze, Sharon Levy · 2026-07-02 04:00

通过逻辑网格谜题评估大型语言模型推理中的隐性偏见

arXiv:2511.06160v2 Announce Type: replace Abstract: While recent safety guardrails effectively suppress overtly biased outputs, subtler forms of social bias emerge during complex logical reasoning tasks that evade current evaluation benchmarks. To fill this gap, we introduce a ne…

arXiv cs.CL TIER_1 English(EN) · Yao Dou, Benjamin Mamut, Wei Xu · 2026-07-02 04:00

Gavel：Agent 满足长上下文法律摘要 LLM 评估清单

arXiv:2601.04424v2 Announce Type: replace Abstract: Large language models (LLMs) now support contexts of up to 1M tokens, but their strengths and weaknesses on complex long-context tasks remain unclear. To study this, we focus on multi-document legal case summarization, where a s…

arXiv cs.CL TIER_1 English(EN) · Yujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, Moritz M\"uller, Simon Razniewski · 2026-07-02 04:00

GPTKB v1.5：一个用于探索事实性 LLM 知识的庞大知识库

arXiv:2507.05740v2 Announce Type: replace Abstract: Language models are powerful artifacts, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlink…

arXiv cs.CL TIER_1 English(EN) · Yihuang Kang · 2026-07-02 01:33

ADVENT: LLM-Driven Automatic Predicate Invention for ILP

Predicate invention (PI), the creation of new predicates to extend the hypothesis space, remains a critical bottleneck in Inductive Logic Programming (ILP). Existing methods rely on domain expertise and produce semantically opaque predicates, hindering adaptation to unfamiliar do…

arXiv cs.CL TIER_1 English(EN) · Hasan Kurban · 2026-07-01 19:49

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

We introduce ISOSCI, a benchmark of isomorphic cross-domain science problem pairs that separates reasoning ability from domain knowledge retrieval in LLM evaluation. Each pair shares identical logical structure but requires different domain-specific knowledge, enabling controlled…

arXiv cs.AI TIER_1 English(EN) · Zhaoyang Luo, Runmin Dong, Miao Yang, Fan Wei, Yushan Lai, Bin Luo, Haohuan Fu · 2026-07-01 04:00

参与、转换或静默：面向高效多模态大模型推理的算子级视觉跳过

arXiv:2606.31903v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) increasingly process long visual-token sequences, increasing the overall inference computation. Existing acceleration methods usually remove visual tokens or skip visual-token updates in en…

arXiv cs.LG TIER_1 English(EN) · Hongmin Li · 2026-07-01 04:00

LLM推理的定向测试：一个审计约束协议

arXiv:2605.11599v3 Announce Type: replace Abstract: Fixed reasoning benchmarks evaluate canonical prompts, but semantically valid changes in presentation can still change model behavior. Studies of prompt variation can reveal such failures, but without audit they can mix genuine …

arXiv cs.CL TIER_1 English(EN) · Xudong Shen, Li Yuan, Ye Chen, Xin Wu, Yi Cai, Zhiyong Wu · 2026-07-01 04:00

真相还是诡辩？LoFa：衡量LLM在逻辑谬误鲁棒性的基准

arXiv:2606.31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can identify…

arXiv cs.AI TIER_1 English(EN) · Zijun Di, Bin Lu, Huquan Kang, Luoyi Fu, Jiaxin Ding, Xiaoying Gan, Lei Zhou, Xinbing Wang · 2026-07-01 04:00

利用同质性感知结构和语义文本属性图压缩改进大型语言模型推理

arXiv:2601.08187v3 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated promising capabilities in Text-Attributed Graph (TAG) understanding. Recent studies typically focus on verbalizing the graph structures via handcrafted prompts, feeding the target n…

arXiv cs.AI TIER_1 English(EN) · Ankur Samanta, Akshayaa Magesh, Tal Lancewicki, Ayush Jain, Youliang Yu, Paul Sajda, Kaveh Hassani, Aditya Modi, Daniel R. Jiang, Yonathan Efroni · 2026-07-01 04:00

BayesBench：评估大型语言模型在多轮证据累积下的信念轨迹

arXiv:2606.30850v1 Announce Type: new Abstract: Large language models (LLMs) are typically deployed in multi-turn conversations, where each turn provides new evidence that should reduce epistemic uncertainty about their environment. Acting rationally then requires inferring the u…

arXiv cs.AI TIER_1 English(EN) · Chao Wang, Hongtao Tian, Tao Yang, Yunsheng Shi, Ting Yao, Wenbo Ding · 2026-06-30 04:00

Process Advantage Signal Shaping: A Paradigm-Agnostic Middleware for Process-Supervised RL in LLM Reasoners

arXiv:2606.29296v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) is a default recipe for process-supervised reinforcement learning of LLM reasoners, and dense process supervision -- via learned process reward models (PRMs) or on-policy-distillation KL sig…

arXiv cs.LG TIER_1 English(EN) · Pei-Chi Pan, Yingbin Liang, Sen Lin · 2026-06-30 04:00

基于强化学习的LLM推理的奖励建模：设计、挑战与评估

arXiv:2602.09305v2 Announce Type: replace Abstract: Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)-based fine-tuning is a key mechanism for improvement, but its effectiveness …

arXiv cs.LG TIER_1 English(EN) · Jinda Lu, Kexin Huang, Junkang Wu, Shuo Yang, Jinghan Li, Chiyu Ma, Shaohang Wei, Xiang Wang, Guoyin Wang, Jingren Zhou · 2026-06-30 04:00

Experience Augmented Policy Optimization for LLM Reasoning

arXiv:2606.30420v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for improving the reasoning capabilities of large language models (LLMs). However, existing RLVR methods typically rely on on-policy optimization from scra…

arXiv cs.CL TIER_1 English(EN) · Jiaqi Li, Fanghui Song · 2026-06-30 04:00

在不完整图证据下对LLM推理进行接地

arXiv:2606.30247v1 Announce Type: new Abstract: Knowledge graphs can guide large language models (LLMs) reasoning, but the graph seen by a system is usually a retrieved, linked, temporally scoped, and incomplete evidence state rather than a complete account of truth. We develop a…

arXiv cs.AI TIER_1 English(EN) · Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, Yuzhe Yang · 2026-06-30 04:00

HEARTS: 医疗时间序列上的大语言模型推理基准测试

arXiv:2603.06638v3 Announce Type: replace-cross Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to general-purpose reasoning. Yet, existing benchmarks cover only a small set of health time series modalities and tasks, fail…

arXiv cs.AI TIER_1 English(EN) · Tiancheng Xing, Jerry Li, Yixuan Du, Xiyang Hu · 2026-06-30 04:00

大型语言模型是可靠的排名者吗？通过两阶段令牌优化进行排名操纵

arXiv:2510.06732v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used as rerankers in information retrieval, yet their ranking behavior can be steered by small, natural-sounding prompts. To expose this vulnerability, we present Rank Anything…

arXiv cs.AI TIER_1 English(EN) · Maohao Ran, Zhenglin Wan, Cooper Lin, Yanting Zhang, Hongyu Xin, Hongwei Fan, Yibo Xu, Beier Luo, Yaxin Zhou, Wangbo Zhao, Lijie Yang, Lang Feng, Fuchao Yang, Jingxuan Wu, Yiqiao Huang, Chendong Ma, Yusen Huang, Dailing Jiang, Jianbo Deng, Sirui Han, Yan… · 2026-06-30 04:00

CaveAgent：将大语言模型转变为有状态运行时运算符

arXiv:2601.01569v4 Announce Type: replace Abstract: LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms that struggle with long-horizon tasks due to fragile multi-turn dependencies and conte…

arXiv cs.AI TIER_1 English(EN) · Marco Aruta, Francesco Improta, Vadim Malvone, Aniello Murano, Vladana Perlic · 2026-06-30 04:00

通过大型语言模型将自然语言翻译为战略时间规范

arXiv:2606.30441v1 Announce Type: cross Abstract: A rigorous formalization of system requirements is a fundamental prerequisite for the verification of Multi-Agent Systems (MAS). However, writing correct formal specifications is well known as an error-prone, time-consuming, and e…

arXiv cs.AI TIER_1 English(EN) · Xiteng Yao, Taeho Kim, Hengzhi Pei, Xinle Liu, Kyle Ulrich, Leonard Lausen, Ashish Khetan, Xiang Song, George Karypis, Martin Herbordt · 2026-06-30 04:00

KernelSight-LM：一个内核级LLM推理模拟器

arXiv:2606.28565v1 Announce Type: cross Abstract: As large language models (LLMs) move into production serving, practitioners must rapidly evaluate inference performance across diverse hardware, models, and serving parameters to meet cost and latency targets. However, the end-to-…

arXiv cs.AI TIER_1 English(EN) · Jingyao Liu, Danling Meng, Chen Huang, Yukun Yan, Zhenghao Liu, Wenqiang Lei, See-Kiong Ng, Maosong Sun · 2026-06-30 04:00

HippoSpark：LLM推理的按需体验系统

arXiv:2606.29929v1 Announce Type: new Abstract: Distilling historical trajectories into reusable experience to enhance future problem-solving has become a focal point of recent LLM research. However, existing methods predominantly operate at the task level, leveraging general sum…

arXiv cs.AI TIER_1 English(EN) · Tianlong Wang, Yuhang Wang, Weibin Liao, Xin Gao, Xinyu Ma, Yang Lin, Yasha Wang, Liantao Ma · 2026-06-30 04:00

从推理中探寻真相：一种动态表征编辑框架，用于引导大语言模型轨迹

arXiv:2606.28589v1 Announce Type: new Abstract: Current approaches to enhance Large Language Model (LLM) reasoning, such as Chain-of-Thought and "Wait" prompts, primarily encourage models to think more, yet often fail to guide them toward Truth. While Representation Editing (RepE…

arXiv cs.CL TIER_1 English(EN) · Zhiyong Wu · 2026-06-30 02:17

真相还是诡辩？LoFa：LLM 逻辑谬误鲁棒性基准测试

Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can identify or classify fallacies, leaving their robustness…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Vladana Perlić · 2026-06-29 15:15

Translating Natural Language to Strategic Temporal Specifications via LLMs

A rigorous formalization of system requirements is a fundamental prerequisite for the verification of Multi-Agent Systems (MAS). However, writing correct formal specifications is well known as an error-prone, time-consuming, and expertise-intensive task. This difficulty is furthe…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · Vladana Perlic · 2026-06-29 15:15

使用大型语言模型将自然语言翻译为战略性时间规范

A rigorous formalization of system requirements is a fundamental prerequisite for the verification of Multi-Agent Systems (MAS). However, writing correct formal specifications is well known as an error-prone, time-consuming, and expertise-intensive task. This difficulty is furthe…

arXiv cs.LG TIER_1 English(EN) · Jingren Zhou · 2026-06-29 15:05

Experience Augmented Policy Optimization for LLM Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for improving the reasoning capabilities of large language models (LLMs). However, existing RLVR methods typically rely on on-policy optimization from scratch, resulting in high sampling costs and ineffi…

arXiv cs.CL TIER_1 English(EN) · Fanghui Song · 2026-06-29 12:56

在不完整图证据下对LLM推理进行接地

Knowledge graphs can guide large language models (LLMs) reasoning, but the graph seen by a system is usually a retrieved, linked, temporally scoped, and incomplete evidence state rather than a complete account of truth. We develop a theoretical perspective on grounding observable…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-29 12:56

Grounding LLM Reasoning under Incomplete Graph Evidence

Knowledge graphs can guide large language models (LLMs) reasoning, but the graph seen by a system is usually a retrieved, linked, temporally scoped, and incomplete evidence state rather than a complete account of truth. We develop a theoretical perspective on grounding observable…

arXiv cs.AI TIER_1 English(EN) · Yuzhe Wang, Yaochen Zhu, Jundong Li · 2026-06-29 04:00

CausalFlip：超越语义匹配的LLM因果判断基准测试

arXiv:2602.20094v2 Announce Type: replace Abstract: As large language models (LLMs) witness increasing deployment in complex, high-stakes decision-making scenarios, it becomes imperative to ground their reasoning in causality rather than spurious correlations. However, strong per…

arXiv cs.AI TIER_1 English(EN) · Yuhang Chen, Jinhao Duan, Ruichen Zhang, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Parish Aggarwal, Frank Shyu, Luke Simon, Sandeep Pandey, Tianlong Chen, Xi Liu · 2026-06-29 04:00

面向资源自适应大模型推理的端到端动态稀疏化

arXiv:2606.27743v1 Announce Type: cross Abstract: Large Language Models (LLMs) inference is typically deployed under a static resource assumption, where models execute a fixed computational graph regardless of the runtime environment. However, real-world cloud infrastructure is i…

arXiv cs.CL TIER_1 English(EN) · Carrie Chen · 2026-06-29 04:00

EntMTP：通过熵引导多令牌预测加速LLM推理

arXiv:2606.27550v1 Announce Type: new Abstract: Multi-token prediction has been shown to increase data density during training, improve downstream text-generation quality, and serves as the defacto approach for self-speculative decoding. Existing foundation and open source models…

arXiv cs.AI TIER_1 English(EN) · Yiheng Tao, Yihe Zhang, Matthew Dearing, Xin Wang, Yuping Fan, Michael E. Papka, Zhiling Lan · 2026-06-29 04:00

服务前排序：通过成对学习排序实现低延迟大模型服务

arXiv:2510.03243v3 Announce Type: replace-cross Abstract: Efficient scheduling of large language model (LLM) inference tasks is critical for achieving low latency and high throughput, a challenge that is becoming increasingly acute with the rise of reasoning-capable LLMs whose ge…

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Xi Liu · 2026-06-26 05:48

面向资源自适应大模型推理的端到端动态稀疏化

Large Language Models (LLMs) inference is typically deployed under a static resource assumption, where models execute a fixed computational graph regardless of the runtime environment. However, real-world cloud infrastructure is inherently dynamic, characterized by fluctuating av…

arXiv cs.AI TIER_1 English(EN) · Derek Thomas · 2026-06-26 04:00

长远期LLM推理的上下文循环利用

arXiv:2606.26105v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce ContextFo…

arXiv cs.CL TIER_1 English(EN) · Jinghan Wang, Yanjun Chen, Wei Zhang, Xiaotong Huang, Tianchen Liu, Gaoliang Peng · 2026-06-26 04:00

面向工业物联网设备端大模型推理的级联多粒度剪枝

arXiv:2606.26861v1 Announce Type: new Abstract: Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimati…

arXiv cs.AI TIER_1 English(EN) · Haoqian Meng, Yilun Luo, Yafei Zhao, Wenyuan Liu, Huaqing Zheng, Xindian Ma, Peng Zhang · 2026-06-26 04:00

SharQ：连接激活稀疏性和 FP4 量化以实现 LLM 推理

arXiv:2606.26587v1 Announce Type: cross Abstract: Low-bit floating-point formats and semi-structured sparsity are increasingly supported by modern accelerators, yet combining them for LLM activation compression remains challenging: activations contain input-dependent outliers tha…

arXiv cs.CL TIER_1 English(EN) · Carrie Chen · 2026-06-25 20:54

EntMTP：通过熵引导的多令牌预测加速大语言模型推理

Multi-token prediction has been shown to increase data density during training, improve downstream text-generation quality, and serves as the defacto approach for self-speculative decoding. Existing foundation and open source models that use MTP heads commit to a static tree-base…

arXiv cs.CL TIER_1 English(EN) · Gaoliang Peng · 2026-06-25 10:44

面向工业物联网设备端大模型推理的级联多粒度剪枝

Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimation, and their cross-architecture behavior remain…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-25 10:44

面向工业物联网设备端大模型推理的级联多粒度剪枝

Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimation, and their cross-architecture behavior remain…

arXiv cs.LG TIER_1 English(EN) · Peng Zhang · 2026-06-25 04:19

SharQ：连接激活稀疏性和 FP4 量化以实现 LLM 推理

Low-bit floating-point formats and semi-structured sparsity are increasingly supported by modern accelerators, yet combining them for LLM activation compression remains challenging: activations contain input-dependent outliers that dominate block scales in FP4 quantization, and d…

arXiv cs.LG TIER_1 English(EN) · DatologyAI, :, Matthew L. Leavitt, Siddharth Joshi, Haoli Yin, Rishabh Adiga, Haakon Mongstad, Alvin Deng, David Schwab, Bogdan Gaza, Ari Morcos · 2026-06-25 04:00

简洁是推理效率的灵魂：通过数据精选诱导视觉语言模型（VLMs）的简洁性

arXiv:2606.25432v1 Announce Type: new Abstract: Inference efficiency is typically pursued by shrinking the model: distillation, pruning, quantization, and sparse routing each lower per-token cost while treating token count as fixed. But output length has been inflating, and it is…

arXiv cs.LG TIER_1 English(EN) · Stefan Wahl, Raphaela Schenk, Ali Farnoud, Jakob H. Macke, Daniel Gedon · 2026-06-25 04:00

基于LLM模型的概率性发现框架

arXiv:2602.18266v2 Announce Type: replace Abstract: Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-style iterative workflows that re…

arXiv cs.CL TIER_1 English(EN) · Jaeyong Ko, Pilsung Kang, Yukyung Lee · 2026-06-25 04:00

Cliff Tokens：识别LLM数学推理中的单Token失败触发器

arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk, or…

arXiv cs.LG TIER_1 English(EN) · Francisco Ferreira da Silva, Stefan Heimersheim · 2026-06-25 04:00

大型语言模型中特征特定错误校正的证据

arXiv:2606.24964v1 Announce Type: new Abstract: Understanding the features of large language models (LLMs) is a central goal of interpretability. LLMs are commonly assumed to use superposition to represent more features than they have dimensions. They may not only represent featu…

arXiv cs.AI TIER_1 English(EN) · Yukyung Lee · 2026-06-24 08:03

Cliff Tokens：识别LLM数学推理中的单Token失败触发器

Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk, or sentence level, or at tokens where failure has al…

arXiv cs.AI TIER_1 English(EN) · Ari Morcos · 2026-06-24 05:50

简洁是推理效率的灵魂：通过数据精选诱导视觉语言模型（VLMs）的简洁性

Inference efficiency is typically pursued by shrinking the model: distillation, pruning, quantization, and sparse routing each lower per-token cost while treating token count as fixed. But output length has been inflating, and it is precisely the component the standard toolkit le…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-24 05:50

简洁是推理效率的灵魂：通过数据精选诱导视觉语言模型（VLMs）的简洁性

Inference efficiency is typically pursued by shrinking the model: distillation, pruning, quantization, and sparse routing each lower per-token cost while treating token count as fixed. But output length has been inflating, and it is precisely the component the standard toolkit le…

arXiv cs.AI TIER_1 English(EN) · Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud · 2026-06-24 04:00

无需窥探的微调：可证明的泛化界限与鲁棒的LLM后训练

arXiv:2507.01752v4 Announce Type: replace-cross Abstract: Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying d…

arXiv cs.LG TIER_1 English(EN) · Bohua Zou, Nian Liu, Binqi Sun, Matteo Mascherin, Debayan Roy, Yutao Liu, Yu Peng, Ning Jia, Haibo Chen · 2026-06-24 04:00

EnerInfer：设备端 LLM 推理的能耗感知

arXiv:2606.23001v1 Announce Type: cross Abstract: On-device LLM inference is increasingly attractive for privacy-preserving, reliable, and cost-effective deployment, yet its energy and thermal costs remain a critical bottleneck. Existing systems primarily optimize for decoding sp…

arXiv cs.CL TIER_1 English(EN) · Quan Xiao, Yutong Xuan, Gaowen Liu, Ramana Rao Kompella, Tianyi Chen · 2026-06-24 04:00

LLM 微调的双层数据策展：离线选择与在线自精炼生成

arXiv:2511.21056v2 Announce Type: replace-cross Abstract: Supervised fine-tuning (SFT) datasets are critical to the downstream performance of large language models, yet they often contain low-quality or harmful question-response pairs. To improve SFT data quality, we develop a un…

arXiv cs.AI TIER_1 English(EN) · Zijin Hong, Hao Wu, Su Dong, Junnan Dong, Yilin Xiao, Yujing Zhang, Zhu Wang, Feiran Huang, Linyi Li, Hongxia Yang, Xiao Huang · 2026-06-24 04:00

使用未见过随机变量问题对大型语言模型进行数学推理基准测试

arXiv:2501.11790v5 Announce Type: replace-cross Abstract: Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliab…

arXiv cs.AI TIER_1 English(EN) · Yucheng Wu, Jundong Xu, Mingzhen Ju, Yue Yu, Chenpeng Wang, Haoxuan Li, Liangming Pan · 2026-06-24 04:00

HOLMES：评估大型语言模型的高阶逻辑推理能力

arXiv:2606.23238v2 Announce Type: replace Abstract: Logical reasoning is essential for reliable AI, yet existing benchmarks are largely first-order-logic-centric, focusing on object-level deduction over fixed predicates. This misses many realistic scenarios where models must reas…

arXiv cs.AI TIER_1 English(EN) · Tianbao Ma, Chang Xi, Yichuan Zou, Chengen Li, Linxun Chen, Zilong Lu, Yanan Niu, Zhaojie Liu, Han Li, Kun Gai · 2026-06-24 04:00

ScaleToT：通用的结构化大模型推理，用于十亿级低活跃度用户建模

arXiv:2606.24605v1 Announce Type: new Abstract: Accurate user modeling often depends on rich interaction histories, which are unavailable for billions of low-activity users. Large Language Models (LLMs) can infer latent user states from static profiles, but this reasoning becomes…

arXiv cs.AI TIER_1 English(EN) · Xiaolin Lin, Jingcun Wang, Olga Kondrateva, Yiyu Shi, Bing Li, Grace Li Zhang · 2026-06-24 04:00

CompressKV：语义检索引导的KV缓存压缩，实现资源高效的长上下文LLM推理

arXiv:2606.24467v1 Announce Type: new Abstract: Long-context large language model (LLM) inference is increasingly constrained by the memory footprint and decoding cost of key-value (KV) caches, limiting sustainable deployment on resource-constrained hardware. Existing KV cache ev…

arXiv cs.AI TIER_1 English(EN) · Kun Gai · 2026-06-23 14:05

ScaleToT：通用的结构化大模型推理，用于十亿级低活跃度用户建模

Accurate user modeling often depends on rich interaction histories, which are unavailable for billions of low-activity users. Large Language Models (LLMs) can infer latent user states from static profiles, but this reasoning becomes unreliable when profiles are sparse, and applyi…

arXiv cs.AI TIER_1 English(EN) · Grace Li Zhang · 2026-06-23 11:59

CompressKV：语义检索引导的KV缓存压缩，实现资源高效的长上下文LLM推理

Long-context large language model (LLM) inference is increasingly constrained by the memory footprint and decoding cost of key-value (KV) caches, limiting sustainable deployment on resource-constrained hardware. Existing KV cache eviction methods typically apply heuristic token s…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-23 11:59

CompressKV：语义检索引导的KV缓存压缩，实现资源高效的长上下文LLM推理

Long-context large language model (LLM) inference is increasingly constrained by the memory footprint and decoding cost of key-value (KV) caches, limiting sustainable deployment on resource-constrained hardware. Existing KV cache eviction methods typically apply heuristic token s…

Alignment Forum TIER_1 English(EN) · Josh Engels · 2026-06-22 22:26

LLM驱动的功能发现

<p><span>We would often like to get a qualitative sense of a target model’s behaviors in important distributions (e.g. deployment, RL training, or evals). For example, we might want to </span><a href="https://alignment.anthropic.com/2026/petri-v2/"><span>discover novel behaviors<…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-22 16:06

Concordia：用于容错 LLM 推理的 JIT 编译持久内核检查点

Long-running LLM agents keep valuable state resident on GPUs: KV caches, request schedulers, communication state, and sometimes online adapters. Losing this state after a GPU or communicator failure can discard minutes to hours of work, yet existing recovery mechanisms either res…

arXiv cs.LG TIER_1 English(EN) · Chen Qian · 2026-06-22 16:06

Concordia：用于容错 LLM 推理的 JIT 编译持久内核检查点

Long-running LLM agents keep valuable state resident on GPUs: KV caches, request schedulers, communication state, and sometimes online adapters. Losing this state after a GPU or communicator failure can discard minutes to hours of work, yet existing recovery mechanisms either res…

arXiv cs.CL TIER_1 English(EN) · Fanghen Li · 2026-06-22 14:19

LLM 嵌入空间能否恢复专家结构？

Pretrained text embeddings are increasingly used as representational maps, yet high category separability does not imply that their geometry recovers expert-defined structure. We study this problem in mental-health-related language, where symptom relations provide an external ref…

arXiv cs.CL TIER_1 English(EN) · Xiangnan He · 2026-06-22 12:58

迈向根记忆：为个性化大模型基准测试和增强隐式逻辑记忆检索

Memory systems are essential for personalized Large Language Models (LLMs). However, existing retrieval methods in these systems primarily rely on semantic similarity, potentially missing logically critical memories with limited semantic overlap. Current benchmarks remain inadequ…

arXiv cs.CL TIER_1 English(EN) · Wen Zhang · 2026-06-22 12:50

通过分布优化合成扩展 LLM 知识边界

Knowledge injection via synthetic data is crucial for enhancing Large Language Models (LLMs). However, current synthesis methods simply stop at preset token counts or fixed data ratios, lacking awareness of knowledge distribution. This results in some domains being sparse while o…

arXiv cs.AI TIER_1 English(EN) · Liangming Pan · 2026-06-22 12:23

HOLMES：评估大型语言模型中的高阶逻辑推理

Logical reasoning is essential for reliable AI, yet existing benchmarks are largely first-order-logic-centric, focusing on object-level deduction over fixed predicates. This misses many realistic scenarios where models must reason over rules, predicates, functions, constraints, a…

arXiv cs.CL TIER_1 English(EN) · Rada Mihalcea · 2026-06-20 04:12

语言-能源鸿沟：衡量多语言大模型推理的能源成本

Large language models (LLMs) are increasingly deployed in multilingual settings, yet the energy costs of serving these models across different languages remain poorly understood. We present a systematic study of inference energy consumption across languages with ML.Energy framewo…

arXiv cs.CL TIER_1 English(EN) · Joel Stremmel · 2026-06-19 20:23

Denoising Iterative Self-Correction: 结构化验证循环助力可靠的大语言模型推理

Large language models produce fluent but often incorrect multi-step reasoning, and naive correction methods risk degrading already-correct answers. We introduce Denoising Iterative Self-Correction (DISC), a test-time procedure that treats verification question outputs as noisy me…

arXiv cs.CL TIER_1 English(EN) · Shiguo Lian, Kai Wang, Zhaoxiang Liu, Wen Liu, Minjie Hua, Yutong Liu, Jiangze Yan, Xin Wang, Cong Wang, Yilin Zhang, Yi Shen, Jieyun Huang, Fang Zhao, Huanlin Gao, Ping Chen, Xinyu Yang, Kaikai Zhao, Yao Zhao, Xinggang Wang, Huishuai Zhang, Dongyan Zhao… · 2026-06-19 04:00

面向 Token 操作的大模型推理优化技术

arXiv:2606.20295v1 Announce Type: cross Abstract: Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper pro…

arXiv cs.CL TIER_1 English(EN) · Yu Deng · 2026-06-19 04:00

GEMS：几何约束赋能大语言模型的多语义叠加

arXiv:2606.19946v1 Announce Type: new Abstract: Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed with…

arXiv cs.CL TIER_1 English(EN) · Jinseok Chung, Minkyoung Song, Hyunji Jung, Namhoon Lee · 2026-06-19 04:00

量化上下文学习的随机不确定性以稳健衡量LLM预测置信度

arXiv:2606.19353v1 Announce Type: new Abstract: In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context, …

arXiv cs.LG TIER_1 English(EN) · Abhinit Sen, Ajeet Kumar, Manaranjan Pradhan · 2026-06-19 04:00

弥合社交-语义鸿沟：面向云端大模型推理的边缘端提示压缩SPSD

arXiv:2606.19364v1 Announce Type: new Abstract: The prefill stage of Large Language Model (LLM) inference is a growing contributor to cloud-scale energy cost. Many consumer-support and conversational prompts contain social scaffolding: politeness markers, apologetic preamble, rep…

arXiv cs.AI TIER_1 English(EN) · Xuanzhi Feng, Zhengyang Li, Zeyu Liu, Haoxi Li, Yuming Jiang, Bing Guo, Jingcai Guo, Jie Zhang, Song Guo · 2026-06-19 04:00

超越熵：从令牌级分布偏差中学习以提升LLM推理能力

arXiv:2606.19771v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced Large Language Model (LLM) reasoning; however, it faces a fundamental optimization instability: uniform token updates precipitate entropy collapse, lea…

arXiv cs.CL TIER_1 English(EN) · Qinghuai Ma · 2026-06-18 14:33

面向 Token 操作的大模型推理优化技术

Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper proposes for the first time a four-layer technical ar…

arXiv cs.CL TIER_1 English(EN) · Yu Deng · 2026-06-18 08:43

GEMS：几何约束赋能大语言模型多语义叠加

Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed without constraints, the model collapses. We show th…

arXiv cs.LG TIER_1 English(EN) · Jiaxing Wang, Deping Xiang, Jin Xu, Zirui Liu, Zicheng Zhang, Guoqiang Gong, Jun Fang, Chao Liu, Pengzhang Liu, Tongxuan Liu, Ke Zhang, Qixia Jiang · 2026-06-18 04:00

BLADE: LLM训练的可扩展双层自适应数据选择

arXiv:2606.18650v1 Announce Type: new Abstract: As Large Language Model (LLM) datasets scale to trillions of tokens, data selection has emerged as a critical frontier to filter out uninformative noise and construct adaptive learning trajectories. Beyond static heuristic filtering…

arXiv cs.LG TIER_1 English(EN) · Yueying Li, Yuanfan Chen, Jiayang Chen, Esha Choukse, Haoran Qiu, G. Edward Suh, Rodrigo Fonseca, Ziv Scully, Udit Gupta · 2026-06-18 04:00

超越预测：LLM推理的尾部感知调度

arXiv:2606.18431v1 Announce Type: new Abstract: LLM serving exhibits extreme length variability, making size-based scheduling difficult in practice. Recent LLM schedulers approximate SJF/SRPT using predicted decode lengths or ranks and primarily report mean-centric metrics such a…

arXiv cs.AI TIER_1 English(EN) · Yan Scholten, Sophie Xhonneux, Leo Schwinn, Stephan G\"unnemann · 2026-06-18 04:00

模型崩溃是大型语言模型机器学习遗忘中的一个特性而非错误

arXiv:2507.04219v5 Announce Type: replace-cross Abstract: Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fun…

arXiv cs.AI TIER_1 English(EN) · Shabari S Nair, Krishanu Saini · 2026-06-17 04:00

在P2P网络上实现LLM的分布式推理

arXiv:2606.17059v1 Announce Type: cross Abstract: Prefix caching can reduce LLM inference latency by reusing KV caches across requests with shared prompts, but cluster-scale reuse is challenging because caches are partitioned across nodes. We propose a decentralized, prefix-cache…

arXiv cs.AI TIER_1 English(EN) · Shun Usami, Venkatram Vishwanath, E. Wes Bethel · 2026-06-17 04:00

面向新兴AI加速器的LLM推理的预填充/解码感知评估

arXiv:2606.17104v1 Announce Type: cross Abstract: As large language models (LLMs) are increasingly deployed in latency- and cost-sensitive settings, inference efficiency has become a central systems challenge. While GPUs dominate current deployments, a growing number of AI accele…

arXiv cs.AI TIER_1 English(EN) · Jessica McFadyen, Ole Jorgensen, Harry Coppock, Kevin Wei, Cozmin Ududec · 2026-06-17 04:00

推理计算如何塑造前沿大模型评估

arXiv:2606.17930v1 Announce Type: new Abstract: AI evaluations are shifting toward harder tasks that benefit from longer trajectories involving tool use and iterative problem solving. As a result, performance is increasingly sensitive to the amount and allocation of compute avail…

arXiv cs.LG TIER_1 English(EN) · Md Abdullah Al Mamun, Ngoc Phu Doan, Pedram Zaree, Ihsen Alouani, Nael Abu-Ghazaleh · 2026-06-17 04:00

损失景观中毒：从大型语言模型中定向提取未见过的训练数据

arXiv:2606.17110v1 Announce Type: cross Abstract: Large Language Models are increasingly trained on proprietary or sensitive data, from private healthcare and financial records to user conversations containing secrets. Ensuring the privacy of such data against extraction attacks …

arXiv cs.CL TIER_1 English(EN) · Dong Huang, Jianbo Sun, Pengkun Yang · 2026-06-17 04:00

用于可靠 LLM 在比较图上评估的提示扰动

arXiv:2606.17634v1 Announce Type: new Abstract: Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has beco…

arXiv cs.CL TIER_1 English(EN) · Filip Sondej, Yushi Yang, Adam Mahdi · 2026-06-17 04:00

RepSelect：通过表示选择实现鲁棒的LLM遗忘

arXiv:2606.17168v1 Announce Type: new Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-s…

arXiv cs.AI TIER_1 English(EN) · Cozmin Ududec · 2026-06-16 13:40

推理计算如何塑造前沿大模型评估

AI evaluations are shifting toward harder tasks that benefit from longer trajectories involving tool use and iterative problem solving. As a result, performance is increasingly sensitive to the amount and allocation of compute available at test time ("inference compute"). Yet man…

arXiv cs.CL TIER_1 English(EN) · Pengkun Yang · 2026-06-16 07:44

用于可靠大语言模型在比较图上评估的提示扰动

Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has become a popular paradigm, in which two responses to…

arXiv cs.AI TIER_1 English(EN) · Ziqun Chen, Ming Wu, Michael Heinrich, Jason Zeng, Huiying Lan, Tianwei Zhang, Rui Tan · 2026-06-16 04:00

LLM推理的高效可验证注意力机制

arXiv:2606.16352v1 Announce Type: cross Abstract: Computation integrity of remote large language model (LLM) serving can be questionable. For conventional deep neural networks (DNNs), the existing TEE-shielded DNN partitioning (TSDP) approach uses Trusted Execution Environment (T…

arXiv cs.LG TIER_1 English(EN) · Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Dongqi Cai, Zexi Li, Wanru Zhao, Xinchi Qiu, Nicholas D. Lane · 2026-06-16 04:00

Photon：联邦式大模型预训练

arXiv:2411.02908v2 Announce Type: replace Abstract: Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like fede…

arXiv cs.LG TIER_1 English(EN) · Yingnan Zhao, Razvan Bunescu, Ahmed Louri, Avinash Karanth, Ke Wang · 2026-06-16 04:00

面向高效MoE大模型推理的时空专家预取框架

arXiv:2606.15453v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs), such as Qwen and DeepSeek, have recently emerged as an effective approach to improving model capacity without proportionally increasing computational cost. By replacing …

arXiv cs.LG TIER_1 English(EN) · Alexander Yukhimchuk, Andrey Shulga, Mladen Kolar, Martin Tak\'a\v{c} · 2026-06-16 04:00

隐私来自对称性：正交等变Transformer用于LLM推理

arXiv:2606.16461v1 Announce Type: new Abstract: Running large language models locally is often impractical, pushing inference on sensitive text to third-party providers. Split inference partially mitigates this by keeping tokens on the client and sending only hidden representatio…

arXiv cs.CL TIER_1 English(EN) · Joris K\"oster, Zixuan Liu, Siavash Khajavi, Zizhan Zheng · 2026-06-16 04:00

MemBoost：一种用于成本感知型LLM推理的内存增强框架

arXiv:2603.26557v2 Announce Type: replace Abstract: Large Language Models (LLMs) deliver strong performance but incur high inference cost in real-world services, especially under workloads with repeated or near-duplicate queries across users and sessions. In this work, we propose…

arXiv cs.CL TIER_1 English(EN) · Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, Kun Yuan · 2026-06-16 04:00

CentroidKV：通过 KV 缓存聚类实现高效长上下文 LLM 推理

arXiv:2506.11418v2 Announce Type: replace Abstract: Large language models (LLMs) with extended context windows have become increasingly prevalent for tackling complex tasks. However, the substantial Key-Value (KV) cache required for long-context LLMs poses significant deployment …

arXiv cs.CL TIER_1 English(EN) · Yangjia Hu, Haodong Wang, Zicong Hong, Qianli Liu, Quanxin Shou, Jian Lin, Song Guo, Xiaowei Shen, Xiangjun Huang, Dian Wang, Jian Yang · 2026-06-16 04:00

MosaicQuant：用于统一 4 位 LLM 量化的内点-外点分离

arXiv:2606.15652v1 Announce Type: cross Abstract: 4-bit quantization significantly reduces the memory footprint and accelerates the inference of large language models (LLMs). However, its limited bit-width representation struggles to faithfully capture both dense common values (\…

arXiv cs.AI TIER_1 English(EN) · Jing Ma, Chenhao Dang, Mingjie Liao · 2026-06-16 04:00

AC-ODM：用于样本高效 LLM 预训练的 Actor-Critic 在线数据混合

arXiv:2505.23878v2 Announce Type: replace-cross Abstract: Optimizing pretraining data composition is pivotal for LLM generalization. While dynamic mixing outperforms static strategies by capturing evolving training dynamics, current methods fail to reconcile computational efficie…

arXiv cs.AI TIER_1 English(EN) · Youngcheon You, Banseok Lee, Minseop Choi, Seonyoung Kim, Hyochan Chong, Changdong Kim, Youngmin Kim, Dongkyu Kim · 2026-06-16 04:00

RaBiT: 残差感知二值化训练，用于准确高效的LLM

arXiv:2602.05367v3 Announce Type: replace Abstract: Efficient deployment of large language models (LLMs) requires extreme quantization, forcing a critical trade-off between low-bit efficiency and performance. Residual binarization enables hardware-friendly, matmul-free inference …

arXiv cs.AI TIER_1 English(EN) · Yizhen Yao, Qinglin Zhu, Runcong Zhao, Xiangxiang Dai, Yanzheng Xiang, Yulan He, Lin Gui · 2026-06-16 04:00

遵循潜在路线图：使用锚定令牌导航可撤销的扩散 LLM 解码

arXiv:2606.16847v1 Announce Type: cross Abstract: Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and rema…

arXiv cs.AI TIER_1 English(EN) · Feiyang Chen, Haibo Chen · 2026-06-16 04:00

SMEPilot：具有可扩展矩阵扩展的 LLM 推理的特征化和优化

arXiv:2606.16332v1 Announce Type: cross Abstract: Modern CPUs increasingly integrate matrix extensions, such as Arm Scalable Matrix Extension (SME), that provide high-throughput matrix execution within the CPU. For LLM inference, however, these units are not a universal replaceme…

arXiv cs.AI TIER_1 English(EN) · Jinlong Yang · 2026-06-16 04:00

预算LLM验证中的异方差信号：结构异质性限制优化收益

arXiv:2606.15841v1 Announce Type: new Abstract: Large language model (LLM) systems increasingly use uncertainty signals to allocate limited computation across verification, test-time scaling, tool execution, and other selective-compute decisions. Such policies rely on a \emph{glo…

arXiv cs.CL TIER_1 English(EN) · Adam Mahdi · 2026-06-15 18:06

RepSelect：通过表示选择实现鲁棒的LLM遗忘

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is on…

arXiv cs.AI TIER_1 English(EN) · Lin Gui · 2026-06-15 15:23

遵循潜在路线图：使用锚定令牌导航可撤销的扩散 LLM 解码

Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and remasking tokens, they typically operate within a mixe…

arXiv cs.AI TIER_1 English(EN) · Anas Nassar, Steve Mohr, Leonard Apanasevich, Himanshu Sharma · 2026-06-15 04:00

STREAM：具有双通道 HPC 令牌流的多层 LLM 推理中间件

arXiv:2606.13968v1 Announce Type: cross Abstract: Researchers and practitioners working with large language models face a fragmented landscape: local models are free and private but hardware limits the model size and context windows a researcher can use; institutional HPC centers…

arXiv cs.AI TIER_1 English(EN) · Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fang Dong, Anrui Chen, Ruijun Huang, Xin Zhang, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Robert P. Dick, Yuan Cheng, Tun Lu, Fan Yang, Yixuan Chen, Li Shang · 2026-06-15 04:00

FP4量化大模型训练中均值偏差的诅咒与福音

arXiv:2603.10444v2 Announce Type: replace-cross Abstract: FP4 training promises substantial memory and compute savings for large language models, but remains fragile because blockwise quantization is dictated by extreme activation magnitudes, which inflate dynamic range and compr…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-15 00:00

RepSelect：通过表示选择实现鲁棒的大型语言模型遗忘

RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods.

arXiv cs.AI TIER_1 English(EN) · Xucong Wang, Ziyu Ma, Yong Wang, Shidong Yang, Hailang Huang, Renda Li, Pengkun Wang, Xiangxiang Chu · 2026-06-12 04:00

ReSum：利用强化学习协同 LLM 推理与总结

arXiv:2606.13316v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a central technique for improving long-horizon reasoning in Large Language Models (LLMs). However, existing RLVR methods often encourage unnecessarily long reasoning rollouts,…

arXiv cs.AI TIER_1 English(EN) · Wenbo Chen, Puheng Li, Mengyang Liu, Weijie Su, Tianpei Xie · 2026-06-12 04:00

MARS：用于并行 LLM 测试时扩展的边际对抗风险控制停止

arXiv:2606.12935v1 Announce Type: new Abstract: Parallel test-time scaling samples many reasoning traces and majority-votes their answers, improving LLM accuracy but requiring traces to run to completion, incurring substantial computational overhead. We observe that probing parti…

arXiv cs.AI TIER_1 English(EN) · Xiangxiang Chu · 2026-06-11 13:10

ReSum：利用强化学习协同 LLM 推理与摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is a central technique for improving long-horizon reasoning in Large Language Models (LLMs). However, existing RLVR methods often encourage unnecessarily long reasoning rollouts, which can degrade reasoning coherence and exhau…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 05:56

MARS：用于并行 LLM 测试时扩展的边际对抗风险控制停止

Parallel test-time scaling samples many reasoning traces and majority-votes their answers, improving LLM accuracy but requiring traces to run to completion, incurring substantial computational overhead. We observe that probing partial traces at intermediate checkpoints can extrac…

arXiv cs.AI TIER_1 English(EN) · Wesley Pang, Gregory Hyegang Jun, Feiyang Liu, Deming Chen · 2026-06-11 04:00

TileFuse：专为AMD NPU上高效量化LLM推理设计的融合混合精度内核库

arXiv:2606.11357v1 Announce Type: cross Abstract: With the growing demand for on-device LLM inference, edge SoCs increasingly integrate NPUs to improve performance and energy efficiency under tight power and thermal budgets. However, practical LLM deployment on current client NPU…

arXiv cs.AI TIER_1 English(EN) · Ruxue Shi, Yili Wang, Mengnan Du, Hangting Ye, Yi Chang, Xin Wang · 2026-06-11 04:00

TAROT：用于少样本表格学习的LLM先验图的任务自适应细化

arXiv:2606.11640v1 Announce Type: cross Abstract: Few-shot tabular learning provides a cost-effective approach for real-world applications where annotation is costly and collecting sufficient samples for new tasks is difficult. Existing Traditional and LLM-based methods have demo…

arXiv cs.AI TIER_1 English(EN) · Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan · 2026-06-11 04:00

PoQ-Judge：一种用于去中心化大模型推理中成本感知质量证明的多架构评估框架

arXiv:2606.11196v1 Announce Type: cross Abstract: Decentralized LLM inference networks need lightweight, reference-free quality evaluation for Proof of Quality (PoQ). We present PoQ-Judge, a framework that trains dedicated judge models to score query-output pairs without ground-t…

arXiv cs.CL TIER_1 English(EN) · Feihu Jin, Shipeng Cen, Ying Tan · 2026-06-11 04:00

引导噪声：将随机扰动转化为内存高效 LLM 微调的有效下降

arXiv:2601.04710v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) achieves strong performance but is often limited by the memory overhead of backpropagation. Zeroth-order (ZO) optimization avoids this overhead by estimating gradients through forward pas…

arXiv cs.CL TIER_1 English(EN) · Zhuoyi Peng, Jingzhou Jiang, Hanlin Gu, Lixin Fan, Yi Yang · 2026-06-11 04:00

GraphInfer-Bench：对图数据上 LLM 推理能力进行基准测试

arXiv:2606.11562v1 Announce Type: cross Abstract: Graph analysis underlies many applications whose answers cannot be looked up in a single record or retrieved along a path: laundering rings, drug repurposing, user preference, and scientific theme are all inferred from a node toge…

arXiv cs.CL TIER_1 English(EN) · Ao Sun · 2026-06-11 04:00

指令微调大语言模型解码时真实性方法的受控研究

arXiv:2606.12160v1 Announce Type: new Abstract: In this work, we introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token. Our method extracts a compact set of featur…

arXiv cs.AI TIER_1 English(EN) · Mingyi Luo, Ruichen Zhang, Xiangwang Hou, Jun Du, Chunxiao Jiang, Yong Ren, Shiwen Mao · 2026-06-11 04:00

面向移动边缘通用智能的资源感知大语言模型推理

arXiv:2509.23248v3 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has…

arXiv cs.AI TIER_1 English(EN) · Selen Erkan, Bastian Boll, Kristian Kersting, Bj\"orn Deiseroth, Letitia Parcalabescu · 2026-06-11 04:00

用于公平高效 LLM 基准评估的 Soft-Prompt 调优

arXiv:2606.12117v1 Announce Type: cross Abstract: Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the co…

arXiv cs.CL TIER_1 English(EN) · Ao Sun · 2026-06-10 14:48

指令微调大语言模型解码时真实性方法的受控研究

In this work, we introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token. Our method extracts a compact set of features such as maximum, minimum, mean, standard devi…

arXiv cs.AI TIER_1 English(EN) · Letitia Parcalabescu · 2026-06-10 14:12

Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation

Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the correct answers but lack the ability -- typically in…

arXiv cs.AI TIER_1 English(EN) · Polydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou · 2026-06-10 04:00

Janus：LLM中目标条件信息失真的基准测试

arXiv:2606.10852v1 Announce Type: cross Abstract: LLM deception is often evaluated through direct markers such as fabricated claims, explicit lies, or strategic concealment. However, many real-world misleading communications do not depend on false statements, rather, they arise f…

arXiv cs.LG TIER_1 English(EN) · Manel Slokom, Malek Slokom, Thierno Kante · 2026-06-10 04:00

LLM-as-a-Discriminator：当合成表格仍显真实时

arXiv:2606.09865v1 Announce Type: new Abstract: Privacy and data sharing are often in tension. Many organizations use synthetic data to reduce privacy risk and still share useful data. For tabular data, auditing privacy remains hard. In many cases, even humans cannot easily tell …

arXiv cs.CL TIER_1 English(EN) · Lena S. Bolliger, Lena A. J\"ager · 2026-06-10 04:00

通过重力加权直接偏好优化训练大型语言模型以执行多级指令层次结构

arXiv:2606.10860v1 Announce Type: cross Abstract: Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections a…

arXiv cs.CL TIER_1 English(EN) · Jaeseong Lee, Seung-won Hwang, Samyam Rajbhandari · 2026-06-10 04:00

SpenseGPT：一种实用的单次剪枝技术，可实现LLM推理的稀疏和密集GEMM

arXiv:2606.10445v1 Announce Type: cross Abstract: Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pr…

arXiv cs.CL TIER_1 English(EN) · Ruixuan Huang, Jinyuan Shi, Hantao Huang, Yifan Huang, Ziyi Guan, Hao Zeng, Ian En-Hsu Yen, Minghui Yu · 2026-06-10 04:00

持续的LLM再利用：一种预测器门控的、银行级稀疏训练方法，用于密集到稀疏的LLM

arXiv:2606.10722v1 Announce Type: new Abstract: We study dense-to-sparse continual training as a way to construct channel-sparse large language models from dense checkpoints. Starting from a Qwen2.5-8B dense backbone, we continue training at 32K context and introduce a predictor-…

arXiv cs.CL TIER_1 English(EN) · Keer Lu, Liwei Chen, Guoqing Jiang, Zhiheng Qin, Yunhuai Liu, Wentao Zhang · 2026-06-10 04:00

REAL：一个增强推理的图框架，用于LLM的长期记忆管理

arXiv:2606.10694v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly expected to interact with users over long time horizons. However, due to their finite context window, LLMs cannot retain all past interactions, making long-term memory management essenti…

arXiv cs.CL TIER_1 English(EN) · Pratibha Revankar, Kargi Chauhan, Jihye Kim, Sadiba Nusrat Nur, Vincent Siu, Chenguang Wang · 2026-06-10 04:00

MIRAGE：LLM智能体中的极性翻转编码子空间

arXiv:2606.10304v1 Announce Type: new Abstract: When LLM agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade output-side detection but the underlying computation does not. Across nine encoding…

arXiv cs.AI TIER_1 English(EN) · Pietro Cagnasso, Eugene Belilovsky, Edouard Oyallon · 2026-06-10 04:00

统一本地通信与本地更新以进行LLM预训练

arXiv:2606.11081v1 Announce Type: cross Abstract: Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but st…

arXiv cs.AI TIER_1 English(EN) · Vanessa Schmidt, Huy Hoang Nguyen, C\'edric Jung, Shirin Salehi, Anke Schmeink · 2026-06-10 04:00

LLM训练中统一数据、内存和计算效率：一项调查

arXiv:2606.10706v1 Announce Type: cross Abstract: Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits.…

arXiv cs.AI TIER_1 English(EN) · Huizhen Shu, Xuying Li, Piao Xue · 2026-06-10 04:00

提前停止，减少开销：隐藏状态探测器作为流式 LLM 输出审核的实用方法

arXiv:2606.10487v1 Announce Type: cross Abstract: Deploying large language models in user-facing systems requires efficient output safety filtering. Existing approaches typically rely on a separate moderation model applied after generation, which doubles inference cost and only d…

arXiv cs.AI TIER_1 English(EN) · Hainiu Xu, Italo Luis da Silva, Jiangnan Ye, Yuhao Wang, Wei Liu, Linyi Yang, Jonathan Richard Schwarz, Nicola Paoletti, Yulan He, Hanqi Yan · 2026-06-10 04:00

PreAct-Bench：LLM中预测性监控的基准测试

arXiv:2606.09890v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents capable of executing multi-step action trajectories toward a given objective. While existing safety research has focused on detecting unethical behavior f…

arXiv cs.AI TIER_1 English(EN) · Xinrui Chen, Jianhao Zhang, Ou Wu, Di Gao · 2026-06-10 04:00

双人探戈：耦合任务-参考选择用于安全的 LLM 微调

arXiv:2606.09866v1 Announce Type: cross Abstract: Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our…

arXiv cs.AI TIER_1 English(EN) · Yunhan Jiang, Wenbin Duan, Shasha Guo, Liang Pang, Xiaoqian Sun, Huawei Shen · 2026-06-10 04:00

ActiveMem：面向长时域大模型推理的分布式活动内存

arXiv:2606.10532v1 Announce Type: new Abstract: Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within…

arXiv cs.LG TIER_1 English(EN) · Guoxia Wang, Shuai Li, Congliang Chen, Jinle Zeng, Jiabin Yang, Dianhai Yu, Yanjun Ma, Li Shen · 2026-06-10 04:00

AdaGC：通过自适应梯度裁剪增强 LLM 预训练稳定性

arXiv:2502.11034v3 Announce Type: replace Abstract: Loss spikes remain a persistent obstacle in large-scale language model pretraining. While previous research has attempted to identify the root cause of loss spikes by investigating individual factors, we observe that, in practic…

arXiv cs.LG TIER_1 English(EN) · Qingbo Wu, Ke Li, Wenzhu Wang, Jie Yu, Ruian Zhang, Lili Liu · 2026-06-10 04:00

Tensix 架构上 LLM 推理的算子融合

arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in sel…

arXiv cs.CL TIER_1 English(EN) · Yi Yang · 2026-06-10 01:41

GraphInfer-Bench：对图的LLM推理能力进行基准测试

Graph analysis underlies many applications whose answers cannot be looked up in a single record or retrieved along a path: laundering rings, drug repurposing, user preference, and scientific theme are all inferred from a node together with its neighbourhood. We introduce GraphInf…

arXiv cs.LG TIER_1 English(EN) · Edouard Oyallon · 2026-06-09 16:40

LLM预训练的本地通信与本地更新统一

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but still rely on synchronous All-Reduce operations that…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 16:40

统一本地通信和本地更新以进行LLM预训练

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but still rely on synchronous All-Reduce operations that…

arXiv cs.CL TIER_1 English(EN) · Lena A. Jäger · 2026-06-09 13:39

通过重力加权直接偏好优化训练大型语言模型以执行多级指令层级

Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections and, more broadly, leaves models without a principl…

arXiv cs.AI TIER_1 English(EN) · Sophia Ananiadou · 2026-06-09 13:31

Janus：LLM中目标条件信息失真的基准测试

LLM deception is often evaluated through direct markers such as fabricated claims, explicit lies, or strategic concealment. However, many real-world misleading communications do not depend on false statements, rather, they arise from selective treatment of true material facts: om…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 11:32

持续的LLM再利用：面向密集到稀疏LLM的预测器门控逐块稀疏训练方法

We study dense-to-sparse continual training as a way to construct channel-sparse large language models from dense checkpoints. Starting from a Qwen2.5-8B dense backbone, we continue training at 32K context and introduce a predictor-gated sparse SwiGLU FFN in the 32K stage. For ea…

arXiv cs.CL TIER_1 English(EN) · Minghui Yu · 2026-06-09 11:32

持续的LLM再利用：用于密集到稀疏LLM的预测器门控逐块稀疏训练方法

We study dense-to-sparse continual training as a way to construct channel-sparse large language models from dense checkpoints. Starting from a Qwen2.5-8B dense backbone, we continue training at 32K context and introduce a predictor-gated sparse SwiGLU FFN in the 32K stage. For ea…

arXiv cs.AI TIER_1 English(EN) · Anke Schmeink · 2026-06-09 11:09

统一大型语言模型训练中的数据、内存和计算效率：一项调查

Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits. This survey adopts a constraint-centric perspecti…

arXiv cs.CL TIER_1 English(EN) · Wentao Zhang · 2026-06-09 10:53

REAL：一个增强推理的图框架，用于LLM的长期记忆管理

Large Language Models (LLMs) are increasingly expected to interact with users over long time horizons. However, due to their finite context window, LLMs cannot retain all past interactions, making long-term memory management essential for storing, updating, and retrieving histori…

arXiv cs.AI TIER_1 English(EN) · Huawei Shen · 2026-06-09 08:03

ActiveMem：面向长视域大语言模型推理的分布式活动内存

Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within a single model context. This design imposes a f…

arXiv cs.CL TIER_1 English(EN) · Samyam Rajbhandari · 2026-06-09 05:48

SpenseGPT：实用的一次性剪枝技术，实现LLM推理的稀疏和密集GEMM

Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pruning. Meanwhile, existing relaxed sparsity format…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 05:48

SpenseGPT: 实用的一次性剪枝技术，实现 LLM 推理的稀疏和密集 GEMM

Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pruning. Meanwhile, existing relaxed sparsity format…

arXiv cs.AI TIER_1 English(EN) · Zirui Wang, Yusen Hou, Shaofeng Liang, Bowen Tian, Yanlin Zhang, Wenshuo Chen, Yutao Yue · 2026-06-09 04:00

ABLE：通过基于归因的大模型嵌入来表示和映射LLM

arXiv:2606.07524v1 Announce Type: cross Abstract: The explosive growth of large language models (LLMs) has created a heterogeneous and poorly documented ecosystem, making systematic model comparison increasingly important for provenance auditing, security analysis, and model sele…

arXiv cs.LG TIER_1 English(EN) · Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li · 2026-06-09 04:00

利用灵活的上下文并行高效扩展LLM训练

arXiv:2602.21788v2 Announce Type: replace-cross Abstract: Scaling long-context capabilities is crucial for Large Language Models (LLMs). However, real-world data contain a large number of sequences with heterogeneous lengths. Existing training libraries for LLMs rely on static pa…

arXiv cs.LG TIER_1 English(EN) · Tuc Nguyen, Thai Le · 2026-06-09 04:00

ATLAS：验证器引导的自适应潜在激活引导，用于高效的 LLM 推理

arXiv:2601.03093v2 Announce Type: replace Abstract: Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without updating model parameters…

arXiv cs.LG TIER_1 English(EN) · Jiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee, Liefeng Bo, Tianyu Pang · 2026-06-09 04:00

重新思考LLM RL中的发散正则化

arXiv:2606.09821v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control e…

arXiv cs.LG TIER_1 English(EN) · Haozhe Hu, Hao Wu, Anhao Zhao, Longwei Ding, Peiran Yin, Yunpu Ma, Xiaoyu Shen · 2026-06-09 04:00

超越FLOPs：在以GEMM为中心的分类法下对LLM剪枝的实际推理加速进行基准测试

arXiv:2606.09080v1 Announce Type: new Abstract: Pruning has emerged as a dominant paradigm for accelerating large language model (LLM) inference, spanning a broad spectrum of methods that remove computation across tokens, layers, heads, dimensions, and attention patterns. Despite…

arXiv cs.LG TIER_1 English(EN) · Tuc Nguyen, Thai Le · 2026-06-09 04:00

超越线性激活引导：可逆潜在变换用于控制LLM行为

arXiv:2606.08454v1 Announce Type: new Abstract: Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering…

arXiv cs.LG TIER_1 English(EN) · Zifan Lyu, Chahine Nejma, Tobias Wegel, Fanny Yang, Florian E. Dorner · 2026-06-09 04:00

使用 SySRs 降低 LLM 评估成本：一种可证明利用模型相似性的 Bandit 算法

arXiv:2606.07726v1 Announce Type: new Abstract: Large Language Models are typically benchmarked by evaluating every model on every test query. For practitioners seeking the best model to deploy, this is often wasteful: if a model clearly performs worse than others, there is no ne…

arXiv cs.AI TIER_1 English(EN) · Vincent-Daniel Yun, Junhyuk Jo, Sai Praneeth Karimireddy, Sunwoo Lee · 2026-06-09 04:00

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

arXiv:2605.15491v2 Announce Type: replace-cross Abstract: Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, le…

arXiv cs.AI TIER_1 English(EN) · Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu · 2026-06-09 04:00

POET-X：通过缩放正交变换实现内存高效的LLM训练

arXiv:2603.05500v2 Announce Type: replace-cross Abstract: Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training (POET), a spectrum-prese…

arXiv cs.AI TIER_1 English(EN) · Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Shenhao Wang, Haris Koutsopoulos, Hai Wang, Cathy Wu, Jinhua Zhao · 2026-06-09 04:00

AlphaOPT：利用自改进的大语言模型经验库构建优化程序

arXiv:2510.18428v4 Announce Type: replace Abstract: Optimization modeling underlies critical decision-making across industries, yet remains difficult to automate: natural-language problem descriptions must be translated into precise mathematical formulations and executable solver…

arXiv cs.AI TIER_1 English(EN) · Shijie Zhang, Zheng Xiao, Shiyu Liu, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo, Shaoyu Liu, Wangxiao Zhao, Guanjun Jiang · 2026-06-09 04:00

CLPO：课程学习与策略优化结合，赋能大语言模型推理

arXiv:2509.25004v2 Announce Type: replace Abstract: Online reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning abilities of large language models, but most methods still optimize reasoning trajectories over the static…

arXiv cs.AI TIER_1 English(EN) · Yuhan Ma, Yong Li, Stefan Schmid · 2026-06-09 04:00

FuseFSS：利用函数密钥共享实现高效安全的 LLM 推理

arXiv:2606.09551v1 Announce Type: cross Abstract: Two-server secure inference allows a client to query a hosted large language model (LLM) without revealing prompts or embeddings. Recent GPU systems based on function secret sharing (FSS) make linear layers efficient, but fixed-po…

arXiv cs.AI TIER_1 English(EN) · Hong Guo, Nianhui Guo, Weixing Wang, Jona Otholt, Christoph Meinel, Haojin Yang · 2026-06-09 04:00

APEX4：通过SM内计算再平衡实现高效纯W4A4 LLM推理

arXiv:2606.08761v1 Announce Type: cross Abstract: W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision fallbacks. We present the first systematic study of how intra-SM comp…

arXiv cs.AI TIER_1 English(EN) · Zheng Wang, Eric Liu, Linan Jiang, Zhongkai Yu, Zaifeng Pan, Yue Guan, Yuke Wang, Yufei Ding · 2026-06-09 04:00

FlashCP：LLM训练的负载均衡、通信高效上下文并行

arXiv:2606.08476v1 Announce Type: cross Abstract: Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, existing CP methods suffer from workload imbalance, inefficient kernels, …

arXiv cs.AI TIER_1 English(EN) · Haochang Hao, Dehai Min, Zhifang Zhang, Yunbei Zhang, Miao Xu, Yingqiang Ge, Lu Cheng · 2026-06-09 04:00

POISE：LLM代理中的位置感知不可检测技能注入

arXiv:2606.07943v1 Announce Type: cross Abstract: Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload dera…

arXiv cs.AI TIER_1 English(EN) · Zhanchao Xu, Haoyang Li, Qingfa Xiao, Fei Teng, Chen Jason Zhang, Lei Chen, Qing Li · 2026-06-09 04:00

从僵化到动态：长上下文大语言模型的熵引导自适应推理

arXiv:2606.09508v1 Announce Type: new Abstract: Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixed sparsity patterns or uniform budgets across all attention heads, overlooking the substantial variation in attention beha…

arXiv cs.AI TIER_1 English(EN) · Shibing Mo, Jing Liu, Jianchu Xu, Ruilin Wu · 2026-06-09 04:00

顺序很重要：通过代理引导的大语言模型演进揭示宏观放置序列的隐藏影响

arXiv:2606.08904v1 Announce Type: new Abstract: Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learnin…

arXiv cs.AI TIER_1 English(EN) · Shumeng Yang, Yisu Liu, Jiayi Zheng, Zhaohui Yang, Linjing Li · 2026-06-09 04:00

PAEC：用于RLVR中LLM推理的位置感知熵校准

arXiv:2606.08543v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths…

arXiv cs.AI TIER_1 English(EN) · Siyu Lou, Yao Yan, Yuntian Chen, Quanshi Zhang · 2026-06-09 04:00

跨大语言模型推理一致性：来自共享交互的证据

arXiv:2606.08129v1 Announce Type: new Abstract: Large language models (LLMs) differ in architecture, training data, and optimization procedures, yet they may still develop similar internal inference patterns. In this paper, we examine this hypothesis using interaction-based expla…

arXiv cs.CL TIER_1 English(EN) · Chenguang Wang · 2026-06-09 01:45

MIRAGE：LLM智能体中的极性翻转编码子空间

When LLM agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade output-side detection but the underlying computation does not. Across nine encoding families and eight models from five architectur…

arXiv cs.LG TIER_1 English(EN) · Tianyu Pang · 2026-06-08 17:58

重新思考LLM RL中的发散正则化

Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream met…

arXiv cs.AI TIER_1 English(EN) · Stefan Schmid · 2026-06-08 14:30

FuseFSS：通过函数密钥共享实现高效安全的 LLM 推理

Two-server secure inference allows a client to query a hosted large language model (LLM) without revealing prompts or embeddings. Recent GPU systems based on function secret sharing (FSS) make linear layers efficient, but fixed-point nonlinearities and helper operations remain a …

arXiv cs.AI TIER_1 English(EN) · Qing Li · 2026-06-08 14:02

从僵化到动态：面向长上下文大语言模型的熵引导自适应推理

Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixed sparsity patterns or uniform budgets across all attention heads, overlooking the substantial variation in attention behavior among heads and contexts. We observe two di…

arXiv cs.CL TIER_1 English(EN) · Xiaoyu Shen · 2026-06-08 06:26

超越FLOPs：基于GEMM中心分类法的LLM剪枝实际推理加速基准测试

Pruning has emerged as a dominant paradigm for accelerating large language model (LLM) inference, spanning a broad spectrum of methods that remove computation across tokens, layers, heads, dimensions, and attention patterns. Despite sharing the same objective, these pruning appro…

arXiv cs.LG TIER_1 English(EN) · Ziyue Li, Yang Li, Tianyi Zhou · 2026-06-08 04:00

跳过一层还是循环？LLM中的分层程序学习

arXiv:2606.06574v1 Announce Type: new Abstract: Large language models (LLMs) perform inference by following a fixed depth and order, non-recurrent execution of all layers. We reveal the wide existence of training-free, flexible, dynamic program-of-layers (PoLar), where pretrained…

arXiv cs.AI TIER_1 English(EN) · Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu · 2026-06-08 04:00

零样本嵌入漂移检测：一种针对LLM提示注入的轻量级防御方法

arXiv:2601.12359v1 Announce Type: cross Abstract: Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induc…

arXiv cs.CL TIER_1 English(EN) · Yuhang Zhou, Yixin Cao, Guangnan Ye · 2026-06-08 04:00

从正确性到效用性：基于增益的前缀评估用于LLM推理

arXiv:2606.07190v1 Announce Type: new Abstract: Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 01:10

顺序很重要：通过代理引导的LLM演化揭示宏观放置序列的隐藏影响

Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learning for spatial coordinate determination, the temp…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 00:00

重新思考LLM RL中的发散正则化

DRPO improves LLM reinforcement learning stability by replacing hard masks with smooth regularization that provides continuous gradient corrections beyond trust-region boundaries.

arXiv cs.AI TIER_1 English(EN) · Linjing Li · 2026-06-07 09:51

PAEC：用于RLVR中LLM推理的位置感知熵校准

Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths. While global entropy regularization can encour…

arXiv cs.AI TIER_1 English(EN) · Yufei Ding · 2026-06-07 06:45

FlashCP：LLM训练的负载均衡、通信高效上下文并行

Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, existing CP methods suffer from workload imbalance, inefficient kernels, and redundant communication due to static sequence…

arXiv cs.CL TIER_1 English(EN) · Thai Le · 2026-06-07 05:01

超越线性激活引导：可逆潜在变换用于控制LLM行为

Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering direction in the original activation space, typ…

arXiv cs.AI TIER_1 English(EN) · Thibaud Ardoin, Jonas Sch\"afer, Gerhard Wunder · 2026-06-06 04:00

大语言模型自我认知：引导和检索激活签名

arXiv:2606.06315v1 Announce Type: new Abstract: Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in …

arXiv cs.AI TIER_1 English(EN) · Giuseppe Canonaco, Alberto Pozanco, Daniel Borrajo · 2026-06-06 04:00

LLM 实现语义部分对齐

arXiv:2602.22067v2 Announce Type: replace Abstract: Grounding is a critical step in classical planning, yet it often becomes a computational bottleneck due to the exponential growth in grounded actions and atoms as task size increases. Recent advances in partial grounding have ad…

arXiv cs.AI TIER_1 English(EN) · Rahul Suresh Babu, Laxmipriya Ganesh Iyer · 2026-06-06 04:00

ToolChoiceConfusion：用于可靠LLM代理的因果最小工具过滤

arXiv:2606.06284v1 Announce Type: new Abstract: Large language model agents increasingly rely on external tools, but larger tool menus can reduce reliability and efficiency by increasing wrong-tool calls, premature actions, and token cost. Existing tool-selection methods often op…

arXiv cs.AI TIER_1 English(EN) · Nicol\'as Astorga, Nabeel Seedat, Mihaela van der Schaar · 2026-06-06 04:00

LLM 在不断扩展的搜索空间中进行类似逐步优化的推理

arXiv:2606.05464v1 Announce Type: new Abstract: Verifiable reward training has improved mathematical and coding reasoning, but these domains capture only part of step-by-step decision making. Many real-world tasks require finding a high-value feasible plan among many valid altern…

arXiv cs.AI TIER_1 English(EN) · Manya Pandey, Dhruv Kumar, Murari Mandal, Saurabh Deshpande · 2026-06-06 04:00

GITCO：TSFM 中的门控推理时上下文优化

arXiv:2606.05332v1 Announce Type: new Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy a…

arXiv cs.CL TIER_1 English(EN) · Lu Cheng · 2026-06-06 02:10

POISE：LLM代理中的位置感知不可检测技能注入

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting fail…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-06 00:00

POISE：LLM代理中的位置感知不可检测技能注入

POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while avoiding detection by LLM scanners that are overly sensitive to privileged tool operations.

arXiv cs.CL TIER_1 English(EN) · Guangnan Ye · 2026-06-05 11:56

从正确性到效用性：基于增益的前缀评估用于LLM推理

Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect we ultimately care about: whether a prefix incre…

arXiv cs.CL TIER_1 English(EN) · Yongwei Zhou, Juncheng Diao, Junlin Shang, Peiguang Li, Rongxiang Weng · 2026-06-05 04:00

LLM持续预训练最优超参数的可预测缩放定律

arXiv:2606.05610v1 Announce Type: new Abstract: The efficacy of continued pre-training for Large Language Models (LLMs) hinges upon hyperparameter configurations, such as learning rate and batch size. However, current practices often rely on heuristics or grid searches, leading t…

arXiv cs.CL TIER_1 English(EN) · Ruoxi Sun, Quantong Qiu, Juntao Li, Zecheng Tang, Yihang Lou, Min Zhang · 2026-06-05 04:00

多模态大模型功能稀疏性的机制洞察：通过 CoRe Heads

arXiv:2606.05843v1 Announce Type: new Abstract: While Multimodal Large Language Models (MLLMs) demonstrate remarkable proficiency on complex vision-language tasks, the mechanisms by which they extract query-relevant visual features from complex, noisy contexts remain opaque. In t…

arXiv cs.LG TIER_1 English(EN) · Jingyao Wu, Ashley Wang, Keane Ong, Paul Pu Liang, Rosalind Picard · 2026-06-05 04:00

SHALA-LLM：在对齐大型语言模型时智能处理模糊标签

arXiv:2606.05376v1 Announce Type: new Abstract: Many human-centered tasks, including natural language inference (NLI) and emotion recognition (ER), have multiple plausible interpretations, leading to label ambiguity and challenging disagreements across human annotators. As LLMs a…

arXiv cs.CL TIER_1 English(EN) · Mary Llewellyn, Isobel Thornton, James Bishop, Annie Gray · 2026-06-05 04:00

纠正LLM基准测试中的提示词依赖性：一种带有嵌入空间聚类的贝叶斯分层模型

arXiv:2510.05709v2 Announce Type: replace-cross Abstract: LLM benchmarking metrics often misstate performance and uncertainty as they rely on two assumptions that frequently do not hold in practice: (i) a sufficient number of evaluations are available for classical inference, and…

arXiv cs.CL TIER_1 English(EN) · Jiahao Zeng, Ming Tang, Ningning Ding · 2026-06-05 04:00

通过元学习从隐式成本-性能偏好中学习路由LLMs

arXiv:2606.06178v1 Announce Type: cross Abstract: Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most su…

arXiv cs.LG TIER_1 English(EN) · Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao, Alex Schwing, Ruoyu Sun · 2026-06-05 04:00

PC Layer：用于改进 LLM 预训练的多项式权重预处理

arXiv:2606.06470v1 Announce Type: new Abstract: We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices …

arXiv cs.AI TIER_1 English(EN) · Ruoyu Sun · 2026-06-04 17:55

PC Layer：用于改进 LLM 预训练的多项式权重预处理

We propose a preconditioning (PC) layer, a weight parameterization via polynomial preconditioner that ensures stable weight conditioning throughout LLM training. The PC module reshapes the singular-value spectrum of weight matrices via low-degree polynomial preconditioning. After…

arXiv cs.AI TIER_1 English(EN) · Gerhard Wunder · 2026-06-04 15:54

大语言模型自我认知：引导和检索激活签名

Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in low-entropy scenarios, and that it can be amplif…

arXiv cs.AI TIER_1 English(EN) · Laxmipriya Ganesh Iyer · 2026-06-04 15:24

ToolChoiceConfusion：用于可靠LLM代理的因果最小工具过滤

Large language model agents increasingly rely on external tools, but larger tool menus can reduce reliability and efficiency by increasing wrong-tool calls, premature actions, and token cost. Existing tool-selection methods often optimize semantic relevance, exposing tools whose …

arXiv cs.AI TIER_1 English(EN) · Ningning Ding · 2026-06-04 13:53

通过元学习从隐式成本-性能偏好中学习路由 LLMs

Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot per…

arXiv cs.CL TIER_1 English(EN) · Min Zhang · 2026-06-04 08:18

多模态大模型功能稀疏性的机制洞察：通过 CoRe Heads

While Multimodal Large Language Models (MLLMs) demonstrate remarkable proficiency on complex vision-language tasks, the mechanisms by which they extract query-relevant visual features from complex, noisy contexts remain opaque. In this paper, we present an in-depth interpretabili…

arXiv cs.CL TIER_1 English(EN) · Siheng Xiong, Oguzhan Gungordu, James C. Kerce, Faramarz Fekri · 2026-06-04 04:00

面向搜索增强大语言模型推理的自适应信息控制

arXiv:2602.01672v2 Announce Type: replace Abstract: Search-augmented reasoning agents interleave multi-step reasoning with external retrieval, but uncontrolled retrieval can introduce redundant evidence, saturate the context, and destabilize reinforcement learning (RL). Existing …

arXiv cs.AI TIER_1 English(EN) · Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci · 2026-06-04 04:00

Ekka：LLM推理中静默错误的自动化诊断

arXiv:2606.04594v1 Announce Type: cross Abstract: LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit er…

arXiv cs.CL TIER_1 English(EN) · Xin Zhang, Yang Cao, Baoxing Wu, Kai Song, Siying Li · 2026-06-04 04:00

通过外部子图生成增强LLM的逐步推理能力

arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step reaso…

arXiv cs.AI TIER_1 English(EN) · Liulu He, XuanAng Liu, Juntao Liu, Taolue Feng, Ting Lu, Chunsheng Gan, Zhiyv Peng, Yuan Du, Huanrui Yang, Yijiang Liu, Li Du · 2026-06-04 04:00

LiftQuant：通过维度提升和投影实现连续位宽大语言模型

arXiv:2606.04050v1 Announce Type: cross Abstract: Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To br…

arXiv cs.CL TIER_1 English(EN) · Changcheng Li, Jiancan Wu, Hengheng Zhang, Zhengsu Chen, Guo An, Junxiang Qiu, Xiang Wang, Qi Tian · 2026-06-04 04:00

回答前的信心：高效LLM不确定性估计的范式转变

arXiv:2603.05881v2 Announce Type: replace Abstract: Reliable deployment of large language models (LLMs) requires accurate uncertainty estimation. Existing methods are predominantly answer-first, producing confidence only after generating an answer, which measure the correctness o…

arXiv cs.CL TIER_1 English(EN) · Qinghe Ma, Zhen Zhao, Yiming Wu, Jian Zhang, Lei Bai, Yinghuan Shi · 2026-06-04 04:00

工具总是受益的吗？学习自适应调用工具以实现双模态多模态大语言模型推理

arXiv:2605.19852v2 Announce Type: replace Abstract: Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invoca…

arXiv cs.LG TIER_1 English(EN) · Jack Sanderson, Yihan Wang, Xiaoqian Lu, Gautam Kamath, Yiwei Lu · 2026-06-04 04:00

LLM 训练后序列数据投毒

arXiv:2606.04929v1 Announce Type: new Abstract: LLM post-training proceeds through multiple stages, e.g., supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), where each stage draws data from different…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

跳过一层还是循环？在LLM中学习分层程序

Pretrained language models can execute layers dynamically through flexible program-of-layers strategies that improve accuracy while reducing computational overhead compared to standard fixed-depth inference.

arXiv cs.LG TIER_1 English(EN) · Yiwei Lu · 2026-06-03 14:22

LLM 训练后序列数据投毒

LLM post-training proceeds through multiple stages, e.g., supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), where each stage draws data from different, potentially untrusted sources. Existing litera…

arXiv cs.AI TIER_1 English(EN) · Baris Kasikci · 2026-06-03 08:32

Ekka：LLM推理中静默错误的自动化诊断

LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notorious…

arXiv cs.AI TIER_1 English(EN) · Patrick Emami, Nan Qiang, Peter Graf · 2026-06-03 04:00

对监督微调 LLM 规划器中世界模型恢复的近距离观察

arXiv:2606.03685v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) improves end-to-end classical planning in large language models (LLMs), but do these models also learn to represent and reason about the planning problems they are solving? Due to the relative complexi…

arXiv cs.AI TIER_1 English(EN) · Mubarak Adetunji Ojewale · 2026-06-03 04:00

NetKV：用于解耦大模型推理的网络感知解码实例选择

arXiv:2606.03910v1 Announce Type: cross Abstract: Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and p…

arXiv cs.AI TIER_1 English(EN) · Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu · 2026-06-03 04:00

交给专家吧：通过稀疏性演化进行稀疏微调来修复稀疏LLM

arXiv:2505.24037v3 Announce Type: replace Abstract: Sparse large language models (LLMs) offer an attractive direction toward efficient deployment, but adapting them to downstream tasks remains challenging. The central difficulty is to enable effective task adaptation without sacr…

arXiv cs.AI TIER_1 English(EN) · Shani Goren, Ido Galil, Ran El-Yaniv · 2026-06-03 04:00

何时应让大型语言模型（LLM）不那么具体？选择性抽象用于可靠的长文本生成

arXiv:2602.11908v3 Announce Type: replace Abstract: LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncertainty estimation mechanisms that abst…

arXiv cs.AI TIER_1 English(EN) · Tianxi Gao, Yufan Cai, Yusi Yuan, Jin Song Dong · 2026-06-03 04:00

X-RAY：通过形式化和校准的探针映射大型语言模型的推理能力

arXiv:2603.05290v2 Announce Type: replace Abstract: Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capa…

arXiv cs.AI TIER_1 English(EN) · Hamid Dadkhahi, Firas Trabelsi, Parker Riley, Juraj Juraska, Mehdi Mirzazadeh · 2026-06-03 04:00

面向思考型LLM作为裁判的分布校准推理时间计算

arXiv:2512.03019v2 Announce Type: replace-cross Abstract: Thinking Large Language Models (LLMs) used as judges for pairwise preferences remain noisy at the single-sample level, and common aggregation rules (majority vote, soft self-consistency, or instruction-based self-aggregati…

arXiv cs.AI TIER_1 English(EN) · Mehmet Hamza Erol, Xiangpeng Hao, Federico Bianchi, Ciro Greco, Jacopo Tagliabue, James Zou · 2026-06-03 04:00

LLM 在物理查询计划上的测试时间优化

arXiv:2602.10387v2 Announce Type: replace-cross Abstract: Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving these requires substantial engineer…

arXiv cs.LG TIER_1 English(EN) · Yunsheng Yuan, Shaowei Li, Kai Wang, Zhongyuan Sun, Zheng Zhang, Kai Han, Jun Luo, Feng Li · 2026-06-03 04:00

DECA：去中心化块式Adam，用于非独立同分布数据上的高效LLM全参数微调

arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural p…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-03 00:00

SparDA：高效长上下文LLM推理的稀疏解耦注意力机制

SparDA is a decoupled sparse attention architecture that improves long-context LLM inference by reducing KV cache bottlenecks and attention complexity through aForecast projection for lookahead selection.

arXiv cs.AI TIER_1 English(EN) · Mubarak Adetunji Ojewale · 2026-06-02 17:06

NetKV: 面向解耦大模型推理的网络感知解码实例选择

Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and prefix-cache locality alone, ignoring the topologic…

arXiv cs.AI TIER_1 English(EN) · Peter Graf · 2026-06-02 14:09

对监督微调 LLM 规划器中世界模型恢复的近距离观察

Supervised fine-tuning (SFT) improves end-to-end classical planning in large language models (LLMs), but do these models also learn to represent and reason about the planning problems they are solving? Due to the relative complexity of classical planning problems and the challeng…

arXiv cs.AI TIER_1 English(EN) · Weifang Zhang, Yuzhou Nie, Bowen Pang, Guangrui Ma, Shining Wu · 2026-06-02 04:00

LLM推理的基于阈值的独占批处理

arXiv:2606.00516v1 Announce Type: new Abstract: Mixed batching (MB)--interleaving prefill and decode in a single batch--has become the standard scheduling strategy for large language model (LLM) inference due to its efficiency in maximizing compute and memory utilization. However…

arXiv cs.AI TIER_1 English(EN) · Jiangyu Chen, Banyi · 2026-06-02 04:00

用于多目标贝叶斯优化的证据门控LLM先验

arXiv:2606.01730v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as heuristic advisors for black-box optimization, yet their suggestions and self-reported confidence are not necessarily calibrated to downstream objective values. This issue become…

arXiv cs.AI TIER_1 English(EN) · Thi-Nhung Nguyen, Linhao Luo, Rollin Omari, Junae Kim, Thuy-Trang Vu, Dinh Phung · 2026-06-02 04:00

TriAlign：迈向个性化大语言模型对齐的通用事实一致性

arXiv:2606.01755v1 Announce Type: new Abstract: Personalized large language models adapt responses to users' preferences and social attributes, but can introduce substantial universal truth inconsistencies across social groups, where some groups systematically receive less accura…

arXiv cs.AI TIER_1 English(EN) · Mingyi Wang, Zhuoer Shen, Yuheng Bu, Shaofeng Zou · 2026-06-02 04:00

通过约束策略优化实现检测器规避的大语言模型释义

arXiv:2606.00392v1 Announce Type: cross Abstract: AI-text detectors are vulnerable to paraphrasing and detector-guided paraphrasing attacks, but existing detector-evasion methods often lack precise control over semantic preservation. In particular, optimizing directly for detecto…

arXiv cs.AI TIER_1 English(EN) · Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal, Maurice van Keulen, Elena Mocanu, Mykola Pechenizkiy, Decebal Constantin Mocanu, Torsten Hoefler · 2026-06-02 04:00

动态稀疏性实现内存高效 LLM 训练：从稳定性到实际扩展

arXiv:2606.00888v1 Announce Type: cross Abstract: Dynamic Sparse Training (DST) offers a promising paradigm for improving the training and inference efficiency of deep neural networks; however, we find that in large language model training, DST can suffer from optimization instab…

arXiv cs.AI TIER_1 English(EN) · Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan, Le Xu, Liguang Xie · 2026-06-02 04:00

Lodestar：一个在线学习的 LLM 推理路由器

arXiv:2606.00946v1 Announce Type: cross Abstract: Efficiently serving large language model (LLM) inference tasks is crucial both for user-perceived latency such as time-to-first-token (TTFT) and for GPU utilization. However, LLM request routing, that is, assigning each inference …

arXiv cs.AI TIER_1 English(EN) · Wentao Mo, Yang Liu · 2026-06-02 04:00

将神经符号程序蒸馏到3D多模态大语言模型中

arXiv:2606.01215v1 Announce Type: cross Abstract: Current 3D spatial reasoning methods face a fundamental trade-off: neuro-symbolic 3D (NS3D) concept learners achieve interpretable reasoning through compositional programs but are constrained to closed-set concept vocabularies and…

arXiv cs.AI TIER_1 English(EN) · Yixiu Mao, Yun Qu, Qi Wang, Heming Zou, Xiangyang Ji · 2026-06-02 04:00

RLVR无无效样本：LLM推理的群组优先离轨优化

arXiv:2606.01281v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, its effectiveness is substantially hindered by the prevale…

arXiv cs.AI TIER_1 English(EN) · Denica Kjorvezir, Marko Djukanovi\'c, Ana Gjorgjevikj, Gjorgjina Cenikj, Tome Eftimov · 2026-06-02 04:00

一致且独特：基于相似性图的最大独立集提示选择实现大语言模型基准测试效率

arXiv:2606.01400v1 Announce Type: cross Abstract: Evaluating large language models (LLMs) across comprehensive benchmarks is expensive and time-consuming. We propose a graph-based prompt selection framework that models each benchmark as a similarity graph -- nodes are prompts con…

arXiv cs.AI TIER_1 English(EN) · Liu Qing, Ou Wu, Yi Du · 2026-06-02 04:00

AlphaToken：在LLM后训练中解耦适应性和稳定性，实现路径感知响应Token估值

arXiv:2606.01635v1 Announce Type: cross Abstract: Token selection is pivotal for effective LLM post-training. However, existing methods mostly rely on local heuristics and rarely formulate token selection as a principled valuation of individual response tokens. We introduce $\tex…

arXiv cs.AI TIER_1 English(EN) · Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic · 2026-06-02 04:00

LLM-WikiRace Benchmark：LLM能在多大程度上规划现实世界知识图谱？

arXiv:2602.16902v4 Announce Type: replace Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a targe…

arXiv cs.AI TIER_1 English(EN) · Fangzhou Wu, Sandeep Silwal, Qiuyi Zhang · 2026-06-02 04:00

通过证据校准查询聚类捕获LLM能力

arXiv:2605.17110v2 Announce Type: replace Abstract: Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, ofte…

arXiv cs.AI TIER_1 English(EN) · Maciej Chrab\k{a}szcz, Filip Szatkowski, Bartosz W\'ojcik, Jan Dubi\'nski, Tomasz Trzci\'nski, Sebastian Cygert · 2026-06-02 04:00

使用多层潜在原型实现高效 LLM 审核

arXiv:2502.16174v4 Announce Type: replace-cross Abstract: Although modern LLMs are aligned with human values during post-training, robust moderation remains essential to prevent harmful outputs at deployment time. Existing approaches suffer from performance-efficiency trade-offs …

arXiv cs.LG TIER_1 English(EN) · Andrei Panferov, Erik Schultheis, Soroush Tabesh, Dan Alistarh · 2026-06-02 04:00

Quartet II：通过改进的无偏梯度估计在 NVFP4 中进行准确的 LLM 预训练

arXiv:2601.22813v2 Announce Type: replace Abstract: The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training me…

arXiv cs.LG TIER_1 English(EN) · Tuan Nguyen, Long Tran-Thanh · 2026-06-02 04:00

安全博弈：通过约束优化实现黑盒大模型的推理时对齐

arXiv:2510.09330v3 Announce Type: replace Abstract: Ensuring that large language models (LLMs) comply with safety requirements is a central challenge in AI deployment. Existing alignment approaches primarily operate during training, such as through fine-tuning or reinforcement le…

arXiv cs.LG TIER_1 English(EN) · Kiran Nayudu, Aswini Nutakki, Sai Vinay Naidu, Ashwin Shanmugasundaram · 2026-06-02 04:00

CRMA：LLM模块化持续微调的谱界骨干

arXiv:2606.00382v1 Announce Type: new Abstract: Sequential fine-tuning of large language models forces a choice: let the shared substrate keep learning and accept catastrophic forgetting, or freeze it after task one and foreclose cross-task refinement. Per-task adapter methods (L…

arXiv cs.CL TIER_1 English(EN) · Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su, Yujia Zhou, Min Zhang, Yiqun Liu, Qinyao Ai · 2026-06-02 04:00

LLM-as-a-Judge 长文本输出评估基准测试

arXiv:2606.01629v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly used for long-form generation, reliably evaluating long-form outputs has become a critical challenge. LLM-as-a-judge offers a scalable alternative to human evaluation, yet its reliabi…

arXiv cs.CL TIER_1 English(EN) · Hanno Hiss, Jasper Dekoninck, Martin Vechev · 2026-06-02 04:00

从饱和数据中学习：LLM训练的正确性之外的信号

arXiv:2606.01436v1 Announce Type: new Abstract: The growing capabilities of large language models (LLMs) have led to the saturation of many benchmarks and training datasets used to improve them. Motivated by this, we investigate whether questions solved with perfect empirical acc…

arXiv cs.CL TIER_1 English(EN) · Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta · 2026-06-02 04:00

在消费级硬件上对生物制药制造中自然语言到SQL查询的本地LLM进行基准测试：一项实证基准

arXiv:2606.01338v1 Announce Type: new Abstract: Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence syste…

arXiv cs.AI TIER_1 English(EN) · Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti · 2026-06-02 04:00

大型语言模型是否已准备好进行神经集成机制建模？基准测试与智能体框架

arXiv:2602.18008v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have shown promise in constructing mechanistic models from data. However, existing evaluations largely focus on simplified settings and fail to capture the complexity of real-world scientific m…

arXiv cs.AI TIER_1 English(EN) · Yi Li, Hongze Shen, Lexiang Tang, Xin Li, Xinpeng Ding, Yinsong Liu, Deqiang Jiang, Xing Sun, Xiaomeng Li · 2026-06-02 04:00

DenseMLLM：用于密集预测的标准多模态大语言模型

arXiv:2602.14134v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in high-level visual understanding. However, extending these models to fine-grained dense prediction tasks, such as semantic segmentation …

arXiv cs.AI TIER_1 English(EN) · Bogdan Zagribelnyy, Ivan Ilin, Maksim Kuznetsov, Nikita Bondarev, Mathieu Reymond, Roman Schutski, Thomas MacDougall, Rim Shayakhmetov, Zulfat Miftakhutdinov, Mikolaj Mizera, Vladimir Aladinskiy, Alex Aliper, Alex Zhavoronkov · 2026-06-02 04:00

当单一答案不足以满足需求时：重新思考用于大语言模型的单步逆合成路线设计基准

arXiv:2602.03554v2 Announce Type: replace-cross Abstract: Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and met…

arXiv cs.AI TIER_1 English(EN) · Yu He, Yingxi Li, Colin White, Ellen Vitercik · 2026-06-02 04:00

大型语言模型能否进行结构化推理？通过数据结构视角进行基准测试

arXiv:2505.24069v4 Announce Type: replace-cross Abstract: Large language models (LLMs) are deployed on increasingly complex tasks that require multi-step decision-making. Understanding their algorithmic reasoning abilities is therefore crucial. However, we lack a diagnostic bench…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-01 05:50

用于多目标贝叶斯优化的证据门控LLM先验

Large language models (LLMs) are increasingly used as heuristic advisors for black-box optimization, yet their suggestions and self-reported confidence are not necessarily calibrated to downstream objective values. This issue becomes more pronounced in multi-objective Bayesian op…

arXiv cs.LG TIER_1 English(EN) · Kairun Zhang, Haoyu Li, Yanjun Zhao, Yifan Sun, Huan Zhang · 2026-06-01 04:00

学习零阶优化器以微调LLMs

arXiv:2510.00419v2 Announce Type: replace Abstract: Zeroth-order optimizers have recently emerged as an attractive approach for fine-tuning large language models (LLMs), as they avoid backpropagation and can substantially reduce memory overhead relative to standard first-order tr…

arXiv cs.AI TIER_1 English(EN) · Yuanjian Xu, Jianing Hao, Guang Zhang, Zhong Li · 2026-06-01 04:00

D$^3$：LLM训练的动态定向图约束数据调度

arXiv:2605.31164v1 Announce Type: cross Abstract: Training data plays a central role in large language models (LLMs) optimization, motivating extensive research on data scheduling strategies. Most existing approaches concentrate on adjusting the overall data distribution but negl…

arXiv cs.AI TIER_1 English(EN) · Mikkel Godsk J{\o}rgensen, Lars Kai Hansen · 2026-06-01 04:00

指导大型语言模型？实际上，稀疏自编码器可以超越简单的基线模型

arXiv:2605.31183v1 Announce Type: cross Abstract: Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu…

arXiv cs.AI TIER_1 English(EN) · Azim Ospanov, Zijin Feng, Jiacheng Sun, Haoli Bai, Xin Shen, Farzan Farnia · 2026-06-01 04:00

HERMES：迈向LLM中高效且可验证的数学推理

arXiv:2511.18760v2 Announce Type: replace Abstract: Informal mathematics has been central to modern large language model (LLM) reasoning, offering flexibility and efficient construction of arguments. However, purely informal reasoning is prone to logical gaps and subtle errors th…

arXiv cs.AI TIER_1 English(EN) · Yuzhe Gu, Xiyu Liang, Jiaojiao Zhao, Enmao Diao · 2026-06-01 04:00

OBCache：用于高效长上下文大语言模型推理的最优脑 KV 缓存剪枝

arXiv:2510.07651v2 Announce Type: replace-cross Abstract: Large language models (LLMs) with extended context windows enable powerful applications but impose significant memory overhead, as caching all key-value (KV) states scales linearly with sequence length and batch size. Exis…

arXiv cs.AI TIER_1 English(EN) · Saeed Mohammadzadeh, Erfan Hamdi, Joel Shor, Emma Lejeune · 2026-06-01 04:00

FEM-Bench：用于评估代码生成大语言模型的结构化科学推理基准

arXiv:2512.20732v2 Announce Type: replace-cross Abstract: As LLMs advance their reasoning capabilities about the physical world, the absence of rigorous benchmarks for evaluating their ability to generate scientifically valid physical models has become a critical gap. Computation…

arXiv cs.AI TIER_1 English(EN) · Sher Badshah, Ali Emami, Hassan Sajjad · 2026-06-01 04:00

SCOPE：选择性共形优化成对大模型评判

arXiv:2602.13110v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used as scalable judges in pairwise evaluation, but they remain prone to miscalibration and biases. We propose SCOPE (Selective Conformal Optimized Pairwise Evaluation), a fram…

arXiv cs.CL TIER_1 English(EN) · Sander Land, Daniel M. Bikel · 2026-06-01 04:00

使用项目反应理论审计大型语言模型基准测试

arXiv:2605.30504v1 Announce Type: new Abstract: LLM benchmark labels are frozen at release and silently propagated into downstream benchmarks, errors and all. We introduce an Item Response Theory-based indicator that surfaces likely mislabels at 95% precision in the top 200 examp…

arXiv cs.CL TIER_1 English(EN) · Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang · 2026-06-01 04:00

dMoE：具有可学习块专家的 dLLMs

arXiv:2605.30876v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have recently emerged as a promising alternative to autoregressive models, offering competitive performance while naturally supporting parallel decoding. However, as dLLMs are increasingly int…

arXiv cs.CL TIER_1 English(EN) · Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li, Guang Zhang · 2026-06-01 04:00

迈向高效 LLMs 退火与原则性样本选择

arXiv:2605.31175v1 Announce Type: new Abstract: The annealing phase is a pivotal convergence stage in LLM pre-training that ultimately determines final model quality. However, effectively selecting training data during this phase remains a key challenge. Current strategies rely o…

arXiv cs.CL TIER_1 English(EN) · Zheyu Zhang, Shuo Yang, Gjergji Kasneci · 2026-06-01 04:00

LLM 训练后奖励扰动的整合

arXiv:2605.31494v1 Announce Type: new Abstract: Post-training of language models is commonly framed as a sample-score-update loop implemented by gradient descent. A recent line of work, exemplified by RandOpt, relocates this loop to weight space, sampling Gaussian perturbations a…

arXiv cs.CL TIER_1 English(EN) · Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-T\"ur, Nick Feamster · 2026-06-01 04:00

衡量、定位和消除大型语言模型中的对齐特征

arXiv:2605.30526v1 Announce Type: cross Abstract: Aligned language models often exhibit a recognizable AI-like style, yet its connection to post-training and internal representations remains poorly understood. In this work, we study whether post-training introduces or amplifies A…

arXiv cs.CL TIER_1 English(EN) · Quentin Lemesle, L\'eane Jourdan, Daisy Munson, Pierre Alain, Jonathan Chevelu, Arnaud Delhay, Damien Lolive · 2026-06-01 04:00

*-PLUIE：使用大型语言模型进行个性化评估以改进评估

arXiv:2602.15778v2 Announce Type: replace Abstract: Evaluating the quality of automatically generated text often relies on LLM-as-a-judge (LLM-judge) methods. While effective, these approaches are computationally expensive and require post-processing. To address these limitations…

arXiv cs.CL TIER_1 English(EN) · Juneyoung Park, Yuri Hong, Seongwan Kim, Jaeho Lee · 2026-06-01 04:00

面向设备端大模型微调的内存高效结构化反向传播

arXiv:2602.13069v2 Announce Type: replace-cross Abstract: On-device fine-tuning enables privacy-preserving personalization of large language models, but mobile devices impose severe memory constraints, typically 6--12GB shared across all workloads. Existing approaches force a tra…

arXiv cs.LG TIER_1 English(EN) · Yuxin Yang, Aoxiong Zeng, Xiangquan Yang · 2026-06-01 04:00

LLM微调中数据选择的长期影响

arXiv:2605.30537v1 Announce Type: new Abstract: Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different question…

arXiv cs.AI TIER_1 English(EN) · Stephane Hatgis-Kessell, Emma Brunskill · 2026-06-01 04:00

何时大型语言模型（LLM）能够成为序列强化学习任务的充分策略优化器？

arXiv:2605.30719v1 Announce Type: cross Abstract: We study when large language models (LLMs) can serve as effective black-box policy optimizers for reinforcement learning (RL) tasks, i.e., when can we replace classical RL algorithms with an LLM? We explore this question by introd…

arXiv cs.LG TIER_1 English(EN) · Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, Ren\'e Vidal · 2026-06-01 04:00

超越测试时记忆：用于大语言模型推理的状态空间最优控制

arXiv:2603.09221v2 Announce Type: replace Abstract: Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require…

arXiv cs.AI TIER_1 English(EN) · Liwei Kang, Yee Whye Teh, Wee Sun Lee · 2026-06-01 04:00

LinTree：通过显式结构化搜索历史改进 LLM 推理

arXiv:2605.31492v1 Announce Type: new Abstract: Large language models (LLMs) often solve reasoning problems by generating intermediate traces that explore and revise partial solutions. From a search perspective, these traces can be viewed as linearized search trees, where the mod…

arXiv cs.AI TIER_1 English(EN) · Vincent Granville · 2026-06-01 04:00

无需深度神经网络的大语言模型：新架构、优势及案例研究

arXiv:2605.30385v1 Announce Type: cross Abstract: The purpose of this article is to provide validation to my deep neural network alternative in the context of LLMs. Very recently, there has been a significant interest by Chinese researchers in a model called RBF network, as a sub…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-30 00:00

大型语言模型适应性极限探讨：模型内化先验知识对标注任务性能的影响

Large language models exhibit limited ability to correct zero-shot errors through prompting, with model performance more strongly linked to definition-specific familiarity than text-level memorization metrics.

arXiv cs.CL TIER_1 English(EN) · Gjergji Kasneci · 2026-05-29 16:16

LLM训练后奖励扰动的整合

Post-training of language models is commonly framed as a sample-score-update loop implemented by gradient descent. A recent line of work, exemplified by RandOpt, relocates this loop to weight space, sampling Gaussian perturbations around a pretrained model and ensembling the top-…

arXiv cs.AI TIER_1 English(EN) · Wee Sun Lee · 2026-05-29 16:13

LinTree：通过显式结构化搜索历史改进 LLM 推理

Large language models (LLMs) often solve reasoning problems by generating intermediate traces that explore and revise partial solutions. From a search perspective, these traces can be viewed as linearized search trees, where the model extends a partial solution, abandons it when …

arXiv cs.AI TIER_1 English(EN) · Lars Kai Hansen · 2026-05-29 11:53

指导大型语言模型？实际上，稀疏自编码器可以超越简单的基线模型

Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs did not seem to live up to th…

arXiv cs.CL TIER_1 English(EN) · Guang Zhang · 2026-05-29 11:42

迈向高效 LLMs 退火与原则性样本选择

The annealing phase is a pivotal convergence stage in LLM pre-training that ultimately determines final model quality. However, effectively selecting training data during this phase remains a key challenge. Current strategies rely on empirical heuristics, such as domain filtering…

arXiv cs.AI TIER_1 English(EN) · Zhong Li · 2026-05-29 11:13

D$^3$：LLM训练的动态定向图约束数据调度

Training data plays a central role in large language models (LLMs) optimization, motivating extensive research on data scheduling strategies. Most existing approaches concentrate on adjusting the overall data distribution but neglect the underlying interactions between samples du…

arXiv cs.LG TIER_1 English(EN) · Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu, Dingyan Shang · 2026-05-29 04:00

当大型语言模型奖励设计失败时：面向稀疏结构化强化学习的诊断驱动优化

arXiv:2605.28918v1 Announce Type: new Abstract: For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation. We study PPO-trained agents using MiniGrid as core ev…

arXiv cs.LG TIER_1 English(EN) · Karim Galliamov, Rochelle Choenni, Ivan Titov · 2026-05-29 04:00

知识卸载：将大型语言模型分解为稀疏主干和记忆模块

arXiv:2605.29075v1 Announce Type: new Abstract: LLMs encode both general capabilities and domain-specific knowledge in a single set of parameters. We ask whether this capacity can be reorganized: keeping broadly useful computation in a shared backbone, while moving specialized kn…

arXiv cs.LG TIER_1 English(EN) · Kexin Chu, Yang Zhou, Wei Zhang · 2026-05-29 04:00

MarginGate：稀疏边距触发的批次不变LLM推理验证

arXiv:2605.30218v1 Announce Type: new Abstract: Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token ver…

arXiv cs.AI TIER_1 English(EN) · Haoyang Liu, Jie Wang, Boxuan Niu, Xiongwei Han, Yian Xu, Mingxuan Ye, Zijie Geng, Fangzhou Zhu, Tao Zhong, Mingxuan Yuan, Jianye Hao · 2026-05-29 04:00

Opt-Verifier：通过双边验证释放LLM在优化建模中的强大能力

arXiv:2605.29556v1 Announce Type: new Abstract: Building mathematical optimization models is critical in operations research (OR), while it requires substantial human expertise. Recent advancements have utilized large language models (LLMs) to automate this modeling process. Howe…

arXiv cs.AI TIER_1 English(EN) · Yundong Kim, Heyoung Yang · 2026-05-29 04:00

TRACE：基于Toulmin的推理评估，通过建构性要素进行LLM CoT评估

arXiv:2605.29656v1 Announce Type: new Abstract: Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. Existing metrics rely on final-answer accuracy or surface-level statistics, leaving the reasoning process itself…

arXiv cs.AI TIER_1 English(EN) · Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang, Jianguo Li, Peng Di, Peiyu Liu, Jianwei Yin, Wenhai Wang · 2026-05-29 04:00

面向LLM的领域特定数据合成：通过最小充分表示学习实现

arXiv:2605.30039v1 Announce Type: new Abstract: Large Language Models have demonstrated remarkable progress in general-purpose capabilities and can achieve strong performance in specific domains through fine-tuning on domain-specific data. However, acquiring high-quality data for…

arXiv cs.AI TIER_1 English(EN) · Fares Nabil Ibrahim, Nafis Saami Azad, Raiyan Abdul Baten · 2026-05-29 04:00

面向并行LLM构思的无锚点多元化

arXiv:2605.30150v1 Announce Type: new Abstract: LLMs are increasingly used to generate candidate-idea pools for creative tasks where broad exploration is valuable. Parallel inference can be attractive in this setting when it broadens the pool while retaining quality and cost effi…

arXiv cs.AI TIER_1 English(EN) · Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang, Xin Zhang, Wenshan Wu, Qihao Zhao, Hao Li, Yuanyuan Gao, Kim-Hui Yap, Scarlett Li · 2026-05-29 04:00

揭秘数据组织以增强LLM训练

arXiv:2605.30334v1 Announce Type: new Abstract: Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced…

arXiv cs.AI TIER_1 English(EN) · Boqi Chen, Jos\'e Antonio Hern\'andez L\'opez, Aren A. Babikian · 2026-05-29 04:00

Projectional Decoding: Towards Semantic-Aware LLM Generation

arXiv:2605.30054v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring the semantic validity of these artifacts remains a fundamental challenge. Existing constra…

arXiv cs.AI TIER_1 English(EN) · Kajetan Schweighofer, Conor F. Hayes, Roberto Dailey, Risto Miikkulainen, Xin Qiu · 2026-05-29 04:00

利用进化策略克服LLM微调中的遗忘问题

arXiv:2605.30148v1 Announce Type: cross Abstract: Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only trainin…

arXiv cs.AI TIER_1 English(EN) · Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui, Longtao Huang, Hui Xue, Ningyu Zhang · 2026-05-29 04:00

LoRA 如何记忆？LLM 微调的参数化记忆定律

arXiv:2605.30260v1 Announce Type: cross Abstract: Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rel…

arXiv cs.AI TIER_1 English(EN) · Zhongzhi Li, Xuansheng Wu, Yijiang Li, Lijie Hu, Ninghao Liu · 2026-05-29 04:00

少即是多：使用稀疏自编码器在LLM特征空间中合成多样化数据

arXiv:2602.10388v3 Announce Type: replace-cross Abstract: The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics …

arXiv cs.CL TIER_1 English(EN) · Vinay Samuel, Yapei Chang, Mohit Iyyer · 2026-05-29 04:00

在不牺牲对齐的情况下恢复多样性：后训练 LLM 的 DPO 配方

arXiv:2605.30021v1 Announce Type: new Abstract: Many open-ended instructions have multiple valid answers that users can benefit from seeing, but post-training often narrows an LLM's output space toward a small set of canonical responses. We introduce REDIPO, an offline DPO data-c…

arXiv cs.CL TIER_1 English(EN) · Jiamin Chen, Yidi Wu, Qiexiang Wang, Qianben Chen, Yuchen Li, Yansen Zhang, Xiaokun Zhang, Wangchunshu Zhou, Chen Ma · 2026-05-29 04:00

SEAL：饱和的基准测试能否通过LLM-as-a-Meta-Judge得以复兴？

arXiv:2605.30104v1 Announce Type: new Abstract: Widely used language-model benchmarks are increasingly saturated, with frontier systems often receiving near-tied scores that standard metrics cannot resolve. Rather than constructing harder alternatives, we ask whether existing tas…

arXiv cs.CL TIER_1 English(EN) · Shaojie Wang, Liang Zhang · 2026-05-29 04:00

先知其所解，再解其所以然：预规划赋能大语言模型数学推理

arXiv:2605.30245v1 Announce Type: new Abstract: Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examinati…

arXiv cs.CL TIER_1 English(EN) · Haoxiang Jiang, Zihan Dong, Tianci Liu, Wanying Wang, Ran Xu, Tony Yu, Linjun Zhang, Haoyu Wang · 2026-05-29 04:00

RUBRIC-ARROW：非可验证领域中用于LLM后训练的交替逐点评价奖励建模

arXiv:2605.29156v1 Announce Type: cross Abstract: Pointwise reward modeling offers critical signals for LLM post-training, yet struggles with absolute scoring in subjective, non-verifiable settings. Rubric-based methods address this by decomposing evaluation into explicit criteri…

arXiv cs.CL TIER_1 English(EN) · Anany Kotawala · 2026-05-29 04:00

Paired LLM 评估的解析诊断

arXiv:2605.30315v1 Announce Type: new Abstract: Across two public LLM leaderboards, many displayed pairwise rankings do not meet a conventional paired-test resolution target under the actual paired evaluation design: 11 of 40 Open LLM Leaderboard v1 pairwise comparisons and 4 of …

arXiv cs.AI TIER_1 English(EN) · Daniel Lee, Owen Queen, James Zou · 2026-05-29 04:00

ReasonOps: LLM推理轨迹的算子分割

arXiv:2605.29192v1 Announce Type: new Abstract: Chain-of-thought traces from large reasoning models can span tens of thousands of tokens, yet we lack a vocabulary for describing their internal structure. Previous methods developed to analyze chain-of-thought traces are either too…

arXiv cs.AI TIER_1 English(EN) · Zhihao Liu, Yifan Wu, Jian Lou, Di Wang, Yuxi Zhou, Yuke Hu · 2026-05-29 04:00

对齐但脆弱：通过零阶优化增强大型语言模型安全性的鲁棒性

arXiv:2605.29396v1 Announce Type: new Abstract: Safety alignment for large language models (LLMs) aims to reduce harmful or unsafe behavior while preserving general utility. However, recent findings reveal that alignment effects can be fragile: lightweight post-alignment manipula…

arXiv cs.LG TIER_1 English(EN) · Alaa Khamis, Alaa Maalouf · 2026-05-29 04:00

通过凸重构和梯度缓存实现 LLM 的高效测试时微调

arXiv:2605.30337v1 Announce Type: new Abstract: Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if i…

arXiv cs.LG TIER_1 English(EN) · Xiaowen Jiang, Andrei Semenov, Sebastian U. Stich · 2026-05-29 04:00

通过谱裁剪增强LLM训练

arXiv:2603.14315v2 Announce Type: replace Abstract: While spectral-based optimizers like Muon operate directly on the spectrum of updates, standard adaptive methods such as AdamW do not account for the spectral structure of weights and gradients, leaving them vulnerable to two em…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-29 00:00

dMoE：具有可学习块专家的 dLLMs

Diffusion large language models combined with mixture-of-experts architectures face a mismatch between block parallel decoding and token-level expert selection, which dMoE addresses by aggregating token-level distributions into block-level routing to reduce activated experts and …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-29 00:00

通过最小充分表示学习为大型语言模型进行领域特定数据合成

DOMINO enables domain-specific data synthesis through an inductive approach that learns domain representations from reference examples, improving code benchmark performance without requiring explicit domain descriptions.

arXiv cs.LG TIER_1 English(EN) · Alaa Maalouf · 2026-05-28 17:59

通过凸重构和梯度缓存实现 LLM 的高效测试时微调

Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen …

arXiv cs.AI TIER_1 English(EN) · Scarlett Li · 2026-05-28 17:58

揭秘数据组织以增强LLM训练

Large Language Models (LLMs) have revolutionized various fields, yet their training efficiency is heavily reliant on effective data curation. While data selection has been widely studied, the strategic data organization for enhanced training remains an underexplored area, particu…

arXiv cs.CL TIER_1 English(EN) · Anany Kotawala · 2026-05-28 17:54

Paired LLM 评估的解析诊断

Across two public LLM leaderboards, many displayed pairwise rankings do not meet a conventional paired-test resolution target under the actual paired evaluation design: 11 of 40 Open LLM Leaderboard v1 pairwise comparisons and 4 of 9 MMLU-Pro top-10 adjacent-rank pairs are unreso…

arXiv cs.AI TIER_1 English(EN) · Ningyu Zhang · 2026-05-28 17:22

LoRA 如何记忆？LLM 微调的参数化记忆定律

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving t…

arXiv cs.CL TIER_1 English(EN) · Liang Zhang · 2026-05-28 17:11

先知其所解，再解其所然：预规划赋能大语言模型数学推理

Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examination reveals an inherent paradigm-level gap: both …

arXiv cs.LG TIER_1 English(EN) · Wei Zhang · 2026-05-28 16:50

MarginGate：稀疏边距触发的批次不变LLM推理验证

Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token verification, incurring cost even when most steps a…

arXiv cs.AI TIER_1 English(EN) · Raiyan Abdul Baten · 2026-05-28 16:10

Anchorless Diversification for Parallel LLM Ideation

LLMs are increasingly used to generate candidate-idea pools for creative tasks where broad exploration is valuable. Parallel inference can be attractive in this setting when it broadens the pool while retaining quality and cost efficiency. We study inference-time controls for can…

arXiv cs.AI TIER_1 English(EN) · Xin Qiu · 2026-05-28 16:08

利用进化策略克服LLM微调中的遗忘问题

Evolution Strategies (ES) has recently emerged as a competitive alternative to reinforcement learning (RL) for large language model (LLM) fine-tuning, offering advantages through simplicity, scalability, and inference-only training. However, recent work suggests that ES fine-tuni…

arXiv cs.CL TIER_1 English(EN) · Chen Ma · 2026-05-28 15:46

SEAL：饱和的基准测试能否通过LLM-as-a-Meta-Judge得到复兴？

Widely used language-model benchmarks are increasingly saturated, with frontier systems often receiving near-tied scores that standard metrics cannot resolve. Rather than constructing harder alternatives, we ask whether existing tasks can be made informative again through improve…

arXiv cs.AI TIER_1 English(EN) · Aren A. Babikian · 2026-05-28 15:05

Projectional Decoding: Towards Semantic-Aware LLM Generation

Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring the semantic validity of these artifacts remains a fundamental challenge. Existing constrained decoding techniques can enforce syntactic cor…

arXiv cs.AI TIER_1 English(EN) · Wenhai Wang · 2026-05-28 14:57

通过最小充分表示学习为大语言模型进行领域特定数据合成

Large Language Models have demonstrated remarkable progress in general-purpose capabilities and can achieve strong performance in specific domains through fine-tuning on domain-specific data. However, acquiring high-quality data for target domains remains a significant challenge.…

arXiv cs.CL TIER_1 English(EN) · Mohit Iyyer · 2026-05-28 14:42

在不牺牲对齐的情况下恢复多样性：后训练 LLM 的 DPO 配方

Many open-ended instructions have multiple valid answers that users can benefit from seeing, but post-training often narrows an LLM's output space toward a small set of canonical responses. We introduce REDIPO, an offline DPO data-construction pipeline for recovering distinct val…

arXiv cs.AI TIER_1 English(EN) · Yuming (Rapheal), Huang, Yao Liu, Lei Wang, Junchen Wan · 2026-05-28 04:00

让结果说话：LLM行为基准测试的复制优先范式

arXiv:2605.27914v1 Announce Type: cross Abstract: Subjective evaluation of LLM behavior -- empathy, restraint, calibrated emotional tone -- is hard. Human inter-rater agreement on such qualities saturates near rho ~ 0.45, and an LLM-as-judge proxy alone risks circularity: a judge…

arXiv cs.AI TIER_1 English(EN) · Zhenghan Song, Yunyi Li, Yulong Liu · 2026-05-28 04:00

用于LLM推理可靠性的前缀安全贝叶斯信念跟踪：校准与排序的分离

arXiv:2605.27712v1 Announce Type: new Abstract: Long reasoning traces need reliability estimates before final answers are known. We study prefix-conditioned eventual-success estimation, $P(y=1 \mid o_{1:t})$, using prefix-safe observations. Sequential Bayesian Belief Tracking (SB…

arXiv cs.AI TIER_1 English(EN) · Hankyeol Kim, Pilsung Kang · 2026-05-28 04:00

提问并非万能：LLM置信度校准中的协议敏感性

arXiv:2605.27752v1 Announce Type: new Abstract: LLM confidence calibration is often evaluated by comparing two signals: token-probability scores and verbalized confidence. These signals are sometimes treated as direct readouts of model uncertainty, but their comparison depends on…

arXiv cs.AI TIER_1 English(EN) · Bowen Wei, Nan Wang, Yuqing Zhou, Jinhao Pan, Ziwei Zhu · 2026-05-28 04:00

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

arXiv:2605.28010v1 Announce Type: new Abstract: Self-evolving large language models (LLMs) learn by generating their own training tasks and solutions, reducing reliance on human-curated supervision. However, in many reasoning domains, the model must also validate generated tasks …

arXiv cs.AI TIER_1 English(EN) · Pu Li, Jiawen Qi, Qinyu Chen · 2026-05-28 04:00

当NPU并非总是更快：移动端LLM推理的阶段级分析

arXiv:2605.27435v1 Announce Type: cross Abstract: Deploying large language models (LLMs) on mobile devices increasingly relies on heterogeneous execution, yet no prior study has systematically characterized NPU effectiveness at the operator and pipeline level. We present the firs…

arXiv cs.AI TIER_1 English(EN) · Yuchao Wu, Wenji Fang, Jing Wang, Wenkai Li, Ziyan Guo, Zhiyao Xie · 2026-05-28 04:00

AssertLLM2：用于从设计规范生成断言的综合性LLM基准测试

arXiv:2605.27472v1 Announce Type: cross Abstract: Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models…

arXiv cs.AI TIER_1 English(EN) · Zehao Liu, Yuanpu Cao, Jinghui Chen, Vasant G. Honavar · 2026-05-28 04:00

恢复最佳点：通过率加权自蒸馏提升LLM推理能力

arXiv:2605.27765v1 Announce Type: cross Abstract: Self-Distillation Policy Optimization (SDPO) provides dense token-level credit assignment for reinforcement learning with large language models by leveraging the model's own feedback-conditioned predictions as a self-teacher. Unli…

arXiv cs.AI TIER_1 English(EN) · Hui Yang, Daiwei He, Kevin Jiang, Taejin Park, Kungang Li, Jiajun Luo, Yuying Chen, Xinyi Zhang, Sihan Wang, Haoyu He, Yu Liu, Lakshmi Manoharan, David Xue, Shubham Barhate, Runze Su, Duna Zhan, Ling Leng, Siping Ji, Jinfeng Zhuang, Alice Wu, Leo Lu, Han… · 2026-05-28 04:00

微调大语言模型作为互补预测器，改进广告系统

arXiv:2605.27856v1 Announce Type: cross Abstract: Recommendation systems power engagement and monetization across feeds, ads, and short-video platforms, but translating the latest advances in Large Language Models into Recommendation Systems (RecSys) gains remains rare, particula…

arXiv cs.LG TIER_1 English(EN) · Zelin Li, Caiwen Ding · 2026-05-28 04:00

LLM Zeroth-Order Fine-Tuning 是一种推理工作负载

arXiv:2605.28760v1 Announce Type: new Abstract: Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training l…

arXiv cs.AI TIER_1 English(EN) · Kerui Peng, Feifei Li, Xingyu Fan, Wenhui Que · 2026-05-28 04:00

语义流正则化：教会 LLM 生成多样化且连贯的响应

arXiv:2605.27971v1 Announce Type: cross Abstract: When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a failure we term Cross-Style Collapse. We trace this collapse to the cross-entropy objectiv…

arXiv cs.AI TIER_1 English(EN) · Leonardo Matthew Yauw, Wei-Bin Kou, Yujiu Yang · 2026-05-28 04:00

大型语言模型推理的集成与跨架构解释

arXiv:2605.28006v1 Announce Type: cross Abstract: Understanding how LLMs reason is hindered by a practical asymmetry: while their generated outputs are observable, the underlying reasoning patterns remain opaque. Relying on single probes, such as Mutual Information Peak (MIP) or …

arXiv cs.AI TIER_1 English(EN) · Jiazhen Huang, Xiao Chen, Xiao Luo, Yong Dai, Senkang Hu, Yuzhi Zhao · 2026-05-28 04:00

面向LLM推理的技能条件门控自蒸馏

arXiv:2605.28791v1 Announce Type: cross Abstract: On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as ref…

arXiv cs.AI TIER_1 English(EN) · Yutong Wang, Pengliang Ji, Chaoqun Yang, Kaixin Li, Ming Hu, Jiaoyang Li, Guillaume Sartoretti · 2026-05-28 04:00

MCTS-Judge：LLM-as-a-Judge 在代码正确性评估中的测试时缩放

arXiv:2502.12468v2 Announce Type: replace-cross Abstract: The LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as programming. Inspired by recent advances in reasoning models and shifts in scaling…

arXiv cs.AI TIER_1 English(EN) · Pengkai Wang, Pengwei Liu, Qi Zuo, Zhijie Sang, Congkai Xie, Hongxia Yang · 2026-05-28 04:00

InfiMed-ORBIT：通过基于评分标准的增量训练在开放式复杂任务上对齐大型语言模型

arXiv:2510.15859v4 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has driven recent breakthroughs in large language models (LLMs), especially for tasks where rewards can be computed automatically, such as code generation. However, it is less effective in open-…

arXiv cs.CL TIER_1 English(EN) · Jiayong Wan, Jiawei Chen, Zhaoxia Yin, Liu Shuyuan, Hang Su · 2026-05-28 04:00

LCO：用于真实世界任务中更安全的代理式LLM的基于LLM的约束优化

arXiv:2605.27375v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly acting as autonomous agents, but their continuous interaction with the environment can lead to in-context reward hacking (ICRH), a phenomenon where LLMs iteratively optimize their behavi…

arXiv cs.CL TIER_1 English(EN) · Pitipat Kongsomjit, Suryansh Goyal, Jacob Whitehill · 2026-05-28 04:00

从软到硬LLM提示的学习翻译

arXiv:2605.27642v1 Announce Type: new Abstract: Soft prompt tuning is a parameter-efficient method for adapting LLMs to specific tasks, but suffers from a lack of interpretability. Building on recent work on interpreting soft prompts (Ramati et al., 2024), we explore how training…

arXiv cs.CL TIER_1 English(EN) · Haihui Pan, Junwei Bao, Hongfei Jiang, Yang Song · 2026-05-28 04:00

FABSVer：LLM数学推理的更快训练和更好的自我验证

arXiv:2605.28389v1 Announce Type: new Abstract: While large language models have made significant progress in mathematical reasoning, they remain unreliable at judging the correctness of their own solutions. Existing approaches that equip models with self-verification typically t…

arXiv cs.LG TIER_1 English(EN) · Binh-Nguyen Nguyen, Khang Tran, NhatHai Phan, Issa Khalil · 2026-05-28 04:00

Gradient Transformer：学习生成 LLM 的更新

arXiv:2605.27591v1 Announce Type: new Abstract: Many organizations lack computational resources to fine-tune large language models (LLMs) on private (unshareable) data for better utility, while fine-tuning tiny language models (TinyLMs) alone performs poorly. To address this bott…

arXiv cs.NE (Neural & Evolutionary) TIER_1 English(EN) · Jianguo Zhang · 2026-05-28 03:22

EvoGM：通过进化生成优化学习合并大型语言模型

Evolutionary model merging provides a powerful framework for the automated, training-free composition of LLMs through parameter-space search. However, existing methods predominantly rely on stochastic, hand-crafted operators that overlook the underlying performance landscape of t…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-28 00:00

LoRA 如何记忆？LLM 微调的参数化记忆定律

Research investigates the quantitative limits of parametric memory in large language models using LoRA as a probe, establishing a power law relationship and developing a threshold-guided optimization method for improved memory performance.

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Dingyan Shang · 2026-05-27 17:57

当大型语言模型奖励设计失败时：面向稀疏结构化强化学习的诊断驱动优化

For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation. We study PPO-trained agents using MiniGrid as core evaluation and MuJoCo as boundary stress test. Our…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 17:49

面向LLM推理的技能条件门控自蒸馏

On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as reference answers or successful traces. We ask whethe…

arXiv cs.AI TIER_1 English(EN) · Yuzhi Zhao · 2026-05-27 17:49

面向LLM推理的技能条件门控自蒸馏

On-policy self-distillation (SD) improves LLM reasoning by using teacher-side privileged information (PI) to turn sparse verifier outcomes into dense token-level supervision. Existing methods usually assume trusted PI, such as reference answers or successful traces. We ask whethe…

arXiv cs.LG TIER_1 English(EN) · Caiwen Ding · 2026-05-27 17:19

LLM 零阶微调是一种推理工作负载

Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeate…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 17:19

LLM Zeroth-Order Fine-Tuning 是一种推理工作负载

Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeate…

arXiv cs.CL TIER_1 English(EN) · Yang Song · 2026-05-27 12:26

FABSVer：LLM数学推理的更快训练和更好的自我验证

While large language models have made significant progress in mathematical reasoning, they remain unreliable at judging the correctness of their own solutions. Existing approaches that equip models with self-verification typically treat solution generation and verification as two…

arXiv cs.AI TIER_1 English(EN) · Paul Sigloch, Christoph Benzm\"uller · 2026-05-27 04:00

面向数据敏感领域的LLM输出的神经符号验证（扩展预印本）

arXiv:2605.26942v1 Announce Type: new Abstract: LLMs deployed in high-stakes domains face fundamental reliability challenges: hallucinations, inconsistencies, and privacy vulnerabilities introduce unacceptable risks where errors carry legal, financial, or safety consequences. Thi…

arXiv cs.AI TIER_1 English(EN) · Allen Nie, Xavier Daull, Zhiyi Kuang, Abhinav Akkiraju, Anish Chaudhuri, Max Piasevoli, Ryan Rong, YuCheng Yuan, Prerit Choudhary, Shannon Xiao, Rasool Fakoor, Adith Swaminathan, Ching-An Cheng · 2026-05-27 04:00

理解LLM在迭代生成优化中的挑战

arXiv:2603.23994v2 Announce Type: replace-cross Abstract: Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in…

arXiv cs.AI TIER_1 English(EN) · Mind Lab, :, Song Cao, Vic Cao, Andrew Chen, Kaijie Chen, Cleon Cheng, Steven Chiang, Kaixuan Fan, Hera Feng, Huan Feng, Arthur Fu, Jun Gao, Hongquan Gu, Aaron Guan, Nolan Ho, Mutian Hong, Hailee Hou, Peixuan Hua, Charles Huang, Miles Jiang, Nora Jiang,… · 2026-05-27 04:00

MinT：用于训练和部署数百万个大型语言模型的托管基础设施

arXiv:2605.13779v2 Announce Type: replace-cross Abstract: We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of exp…

arXiv cs.CL TIER_1 English(EN) · Faeze Ghorbanpour, Alexander Fraser · 2026-05-27 04:00

指令微调大语言模型对长输入中有害语句的敏感性研究

arXiv:2510.05864v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly operate on long inputs, yet their behavior when harmful sentences are sparsely embedded within such inputs remains poorly understood. We present a sensitivity analysis that probes how LL…

arXiv cs.AI TIER_1 English(EN) · Corentin Kervadec, Iuliia Lysova, Iuri Macocco, Marco Baroni, Gemma Boleda · 2026-05-27 04:00

Tracing Computation Density in LLMs

arXiv:2605.27033v1 Announce Type: cross Abstract: Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but it is not clear that they exploit their full capacity for all inputs. We introduce the s-Tr…

arXiv cs.AI TIER_1 English(EN) · Xin Huang, Antoni B. Chan · 2026-05-27 04:00

面向解码器唯一大语言模型归因的忠实度评估及受控保留信息

arXiv:2601.03089v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging. Existing soft-perturbation faithfulness metrics, such as Soft-NC and Soft-NS, can…

arXiv cs.AI TIER_1 English(EN) · Han Jiang, Dongyao Zhu, Zhihua Wei, Xiaoyuan Yi, Ziang Xiao, Xing Xie · 2026-05-27 04:00

PICACO：通过总相关性优化实现LLM的多元上下文价值对齐

arXiv:2507.16679v3 Announce Type: replace-cross Abstract: In-Context Learning has shown great potential for aligning Large Language Models (LLMs) with human values, helping reduce harmful outputs and accommodate diverse preferences without costly post-training, known as In-Contex…

arXiv cs.AI TIER_1 English(EN) · Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, Lei Hou, Juanzi Li, Xiaozhi Wang · 2026-05-27 04:00

利用稀疏自编码器模型内部信息指导大语言模型（LLM）的后训练数据工程

arXiv:2605.27354v1 Announce Type: cross Abstract: Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in mod…

arXiv cs.AI TIER_1 English(EN) · Wenhui Tan, Minghao Li, Xiaoqian Ma, Siqi Fan, Xiusheng Huang, Liujie Zhang, Ruihua Song, Weihang Chen · 2026-05-27 04:00

成对输入，成对输出：用于高效大语言模型的潜在多令牌预测

arXiv:2605.27255v1 Announce Type: cross Abstract: Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decod…

arXiv cs.AI TIER_1 English(EN) · Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang, Yusen Zhang, Liang Wang, Limin Xiao · 2026-05-27 04:00

ReMoE：通过路由器微调提升内存受限MoE LLM推理中的专家复用率

arXiv:2605.27081v1 Announce Type: cross Abstract: Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a sm…

arXiv cs.CL TIER_1 English(EN) · Ishir Garg, Neel Kolhe, Xuandong Zhao, Dawn Song · 2026-05-27 04:00

InfoSynth：LLM 的信息引导基准合成

arXiv:2601.00575v2 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated significant advancements in reasoning and code generation, but efficiently creating new benchmarks to evaluate these capabilities remains a challenge. Traditional benchmark creation…

arXiv cs.AI TIER_1 English(EN) · Adnan Rashid · 2026-05-27 04:00

ReasonOps：可信验证大语言模型推理的统一操作范式

arXiv:2605.27014v1 Announce Type: cross Abstract: Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents. Recent advances in theorem proving, autoformalization, symbolic reasoning, and too…

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Zhifang Liu · 2026-05-27 02:19

微调语言模型作为互补预测器，改进广告系统

Recommendation systems power engagement and monetization across feeds, ads, and short-video platforms, but translating the latest advances in Large Language Models into Recommendation Systems (RecSys) gains remains rare, particularly in advertising and production-scale real-world…

arXiv cs.IR (Information Retrieval) TIER_1 Dansk(DA) · Jiaxuan You · 2026-05-27 01:04

LRanker：大规模候选模型的LLM排名器

Large language models (LLMs) have recently shown strong potential for ranking by capturing semantic relevance and adapting across diverse domains, yet existing methods remain constrained by limited context length and high computational costs, restricting their applicability to re…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-27 00:00

RUBRIC-ARROW：非可验证领域中用于LLM后训练的交替逐点评价奖励建模

RUBRIC-ARROW presents an alternating framework for reward modeling that improves upon rubric-based methods by reducing ties and leveraging pairwise preference data for training.

arXiv cs.AI TIER_1 English(EN) · Xiaozhi Wang · 2026-05-26 17:55

利用稀疏自编码器模型内部信息指导 LLM 训练后数据工程

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering…

arXiv cs.AI TIER_1 English(EN) · Weihang Chen · 2026-05-26 16:31

成对输入，成对输出：用于高效大语言模型的潜在多令牌预测

Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decoding and multi-token prediction, MTP), but the two …

arXiv cs.AI TIER_1 English(EN) · Limin Xiao · 2026-05-26 14:32

ReMoE：通过路由器微调提升内存受限MoE LLM推理中的专家复用率

Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in t…

arXiv cs.AI TIER_1 English(EN) · Gemma Boleda · 2026-05-26 13:55

追踪大型语言模型中的计算密度

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but it is not clear that they exploit their full capacity for all inputs. We introduce the s-Trace method to efficiently estimate the subgraph of…

arXiv cs.AI TIER_1 English(EN) · Adnan Rashid · 2026-05-26 13:32

ReasonOps：可信验证大语言模型推理的统一操作范式

Large Language Models (LLMs) have transformed artificial intelligence from primarily generative systems into increasingly capable reasoning agents. Recent advances in theorem proving, autoformalization, symbolic reasoning, and tool-augmented language models demonstrate substantia…

arXiv cs.AI TIER_1 English(EN) · Christoph Benzmüller · 2026-05-26 12:32

面向数据敏感领域的LLM输出的神经符号验证（扩展预印本）

LLMs deployed in high-stakes domains face fundamental reliability challenges: hallucinations, inconsistencies, and privacy vulnerabilities introduce unacceptable risks where errors carry legal, financial, or safety consequences. This paper presents a hybrid verification architect…

arXiv cs.AI TIER_1 English(EN) · Minwei Kong, Chonghe Jiang, Ao Qu, Wenbin Ouyang, Zhaoming Zeng, Xiaotong Guo, Zhekai Li, Junyi Li, Yi Fan, Xinshou Zheng, Xi Jing, Yikai Zhang, Zhiwei Liang, Seonghoo Kim, Runqing Yang, Zijian Zhou, Sirui Li, Han Zheng, Wangyang Ying, Ou Zheng, Chonghua… · 2026-05-26 04:00

FrontierOR：对LLM在大型优化中高效算法设计能力的基准测试

arXiv:2605.25246v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research and optimization problems often require a harder capability: designing scalable algorithms th…

arXiv cs.LG TIER_1 English(EN) · Haoyu Zheng, Yongqiang Zhang, Fangcheng Fu, Xiaokai Zhou, Hao Luo, Hongchao Zhu, Yuanyuan Zhu, Hao Wang, Xiao Yan, Jiawei Jiang · 2026-05-26 04:00

具有不确定性感知输出长度预测的调度 LLM 推理

arXiv:2604.00499v2 Announce Type: replace Abstract: To schedule LLM inference, the \textit{shortest job first} (SJF) principle is favorable by prioritizing requests with short output lengths to avoid head-of-line (HOL) blocking. Existing methods usually predict a single output le…

arXiv cs.LG TIER_1 English(EN) · Daniel Barley, Jonathan Leis, Benjamin Klenk, Holger Fr\"oning · 2026-05-26 04:00

面向通信感知流水线并行大模型训练评估的表格化调度抽象

arXiv:2605.24006v1 Announce Type: cross Abstract: Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose …

arXiv cs.LG TIER_1 English(EN) · Zili Zhang, Chengxu Yang, Shenglong Zhang, Chenyu Wang, Yufan Zhang, Tuo Dai, Zhouyang Li, Yuhong Ge, Chao Jin, Xin Jin, Yuliang Liu · 2026-05-26 04:00

BigMac：突破多模态大模型训练的计算与内存帕累托前沿

arXiv:2605.25451v1 Announce Type: new Abstract: Training multimodal large language models (MLLMs) is challenged by both model and data heterogeneity. Existing systems redesign the training pipeline to address these challenges, but remain bound by a Pareto frontier between compute…

arXiv cs.LG TIER_1 English(EN) · Enayat Ullah, Sai Aparna Aketi, Devansh Gupta, Huanyu Zhang, Meisam Razaviyayn · 2026-05-26 04:00

面向大型语言模型的随机裁剪高效DP-SGD

arXiv:2605.24879v1 Announce Type: new Abstract: Large language models (LLMs) are trained on vast datasets that may contain sensitive information. Differential privacy (DP), the de facto standard for formal privacy guarantees, provides a principled framework for training LLMs with…

arXiv cs.LG TIER_1 English(EN) · Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su · 2026-05-26 04:00

CurveRL：面向LLM推理的原则性、感知分布的上下文重加权

arXiv:2605.24331v1 Announce Type: new Abstract: Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining wha…

arXiv cs.CL TIER_1 English(EN) · Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou · 2026-05-26 04:00

PowLU：用于 LLM 稳定预训练的激活函数

arXiv:2605.25704v1 Announce Type: new Abstract: In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates th…

arXiv cs.CL TIER_1 English(EN) · Xiangdong Zhang, Debing Zhang, Shaofeng Zhang, Xiaohan Qin, Yu Cheng, Junchi Yan · 2026-05-26 04:00

NITP：LLM预训练的下一隐式Token预测

arXiv:2605.24956v1 Announce Type: new Abstract: Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained, allowi…

arXiv cs.AI TIER_1 English(EN) · Siyuan Liu, Tinghong Chen, Xinghan Li, Yifei Wang, Jingzhao Zhang · 2026-05-26 04:00

数据难度与LLM微调中的泛化-外推权衡

arXiv:2605.12906v2 Announce Type: replace-cross Abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity,…

arXiv cs.AI TIER_1 English(EN) · Ruishuo Chen, Yu Chen, Zhuoran Li, Longbo Huang · 2026-05-26 04:00

PowerFlow：通过原则性分布匹配解锁LLM的双重性

arXiv:2603.18363v2 Announce Type: replace-cross Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current met…

arXiv cs.AI TIER_1 English(EN) · Haojie Ouyang, Jianwei Lv, Lei Ren, Chen Wei, Xiaojie Wang, Fangxiang Feng · 2026-05-26 04:00

ChunkLLM：一个轻量级的可插拔框架，用于加速大语言模型推理

arXiv:2510.02361v2 Announce Type: replace-cross Abstract: Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention's quadratic complexity with input tokens. Recently, researcher…

arXiv cs.AI TIER_1 English(EN) · Zhuchen Cao, Sven Apel, Adish Singla, Vera Demberg · 2026-05-26 04:00

Pragmatic Reasoning improves LLM Code Generation

arXiv:2502.15835v5 Announce Type: replace-cross Abstract: Pragmatic reasoning helps interlocutors infer intended meaning from ambiguous or underspecified messages by considering shared context and counterfactual alternatives. Similar challenges arise in natural language-to-code g…

arXiv cs.AI TIER_1 English(EN) · Akira Okutomi · 2026-05-26 04:00

虚假固定点：康德式反馈、稳定失准和大型语言模型中的表征压缩

arXiv:2510.14925v4 Announce Type: replace Abstract: High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fixed points, locally stable, internally coherent, and confidently wrong. This separates ro…

arXiv cs.AI TIER_1 English(EN) · Parth Darshan, Abhishek Divekar · 2026-05-26 04:00

梯度冲突时：多目标提示优化在LLM裁判中的失效模式

arXiv:2605.26046v1 Announce Type: cross Abstract: Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produ…

arXiv cs.AI TIER_1 English(EN) · Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin, Varun Parekh, Vijaykrishnan Narayanan, Rui Zhang · 2026-05-26 04:00

TIAR：用于LLM弃权学习的轨迹信息优势重加权

arXiv:2605.25850v1 Announce Type: cross Abstract: This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a …

arXiv cs.AI TIER_1 English(EN) · Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang · 2026-05-26 04:00

AutoSG：仅凭任务提示即可驱动LLM生成求解器，用于昂贵的优化

arXiv:2605.25658v1 Announce Type: cross Abstract: Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling e…

arXiv cs.AI TIER_1 English(EN) · Xiangtian Ji, Yuxin Chen, Zhengzhou Cai, Xiang Wang, An Zhang, Tat-Seng Chua · 2026-05-26 04:00

微型大脑，巨大影响：仅用几个提示词揭示大语言模型的关键神经元

arXiv:2605.24846v1 Announce Type: cross Abstract: Large language models (LLMs) display strong comprehensive abilities, yet the internal mechanisms that support these behaviors remain insufficiently understood. In this work, we show that across a wide range of open-weight Transfor…

arXiv cs.AI TIER_1 English(EN) · Jaeung Lee, Dohyun Kim, Jaemin Jo · 2026-05-26 04:00

通过激活块来衡量LLM遗忘的深度

arXiv:2605.24614v1 Announce Type: cross Abstract: Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail …

arXiv cs.AI TIER_1 English(EN) · Haizhou Xia · 2026-05-26 04:00

面向LLM数学推理的、考虑有害性的事后替换的受保护修复

arXiv:2605.24613v1 Announce Type: cross Abstract: Post-hoc repair of LLM mathematical reasoning introduces an asymmetric risk: fixing an incorrect reasoning trace is useful, but replacing a trace that was already correct can be harmful. We study this problem under a selective rep…

arXiv cs.AI TIER_1 English(EN) · Jo\~ao Sedoc, Baotong Zhang, Dean Foster · 2026-05-26 04:00

信任但需验证：用于选择性 LLM 预测的证明者-验证者审议

arXiv:2605.25133v1 Announce Type: new Abstract: Reliably knowing when a language model is correct is almost as important as being correct. We introduce prover-verifier deliberation (PVD), an inference-time protocol grounded in interactive proof theory, as a mechanism for selectiv…

arXiv cs.AI TIER_1 English(EN) · Jingchu Gai, Guanning Zeng, Christina Baek, Chen Wu, J. Zico Kolter, Andrej Risteski, Aditi Raghunathan · 2026-05-26 04:00

理解和缓解过早自信以改善大型语言模型的推理能力

arXiv:2605.24396v1 Announce Type: new Abstract: Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving reasoning quality directly would require process reward…

arXiv cs.AI TIER_1 English(EN) · Ashok Chandrasekar, Jason Kramberger · 2026-05-26 04:00

识别和减轻生产环境中LLM推理基准的系统性测量偏差

arXiv:2605.24217v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from research environments to production deployments, evaluating their performance against strict Service Level Objectives (SLOs) has become critical. However, current evaluation methodolog…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-26 00:00

利用稀疏自编码器模型内部信息指导大语言模型（LLM）的后训练数据工程

SAERL uses Sparse Autoencoder-derived signals from model internals to enhance LLM reinforcement learning through diversity control, difficulty-aware curriculum learning, and quality-based data filtering.

arXiv cs.AI TIER_1 English(EN) · Abhishek Divekar · 2026-05-25 17:08

当梯度碰撞：多目标提示优化用于LLM裁判的失败模式

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vecto…

arXiv cs.AI TIER_1 English(EN) · Rui Zhang · 2026-05-25 13:42

TIAR：用于LLM弃权学习的轨迹信息优势重加权

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamic…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 13:42

TIAR：用于LLM弃权学习的轨迹信息优势重加权

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamic…

arXiv cs.LG TIER_1 English(EN) · Jun Zhou · 2026-05-25 11:02

PowLU：用于LLM稳定预训练的激活函数

In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong non…

arXiv cs.AI TIER_1 English(EN) · Mengjie Zhang · 2026-05-25 10:04

AutoSG：仅凭任务提示即可驱动 LLM 的求解器生成，用于昂贵的优化

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 10:04

AutoSG：仅凭任务提示即可驱动 LLM 的求解器生成，用于昂贵的优化

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 08:26

AnE：通过锚点演化推动多模态大语言模型的推理前沿

Post-training via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is crucial for enhancing reasoning in Multimodal Large Language Models (MLLMs), yet existing paradigms often reach a performance bottleneck due to the limitations of static data. While current methods …

arXiv cs.AI TIER_1 English(EN) · Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Yi Li, Yan Sun, Boyu Wang, Pingzhao Hu · 2026-05-25 04:00

面向结构化LLM推理的感知适配器

arXiv:2602.02780v3 Announce Type: replace Abstract: Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query conn…

arXiv cs.AI TIER_1 English(EN) · Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao · 2026-05-25 04:00

BarrierSteer：通过学习障碍转向实现LLM安全

arXiv:2602.20102v2 Announce Type: replace-cross Abstract: Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in h…

arXiv cs.AI TIER_1 English(EN) · Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea · 2026-05-25 04:00

PoisonForge：面向指令微调大语言模型的任务级定向投毒基准

arXiv:2605.23168v1 Announce Type: cross Abstract: When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed atta…

arXiv cs.AI TIER_1 English(EN) · Chuyifei Zhang, Hongyu Cui, Xiaowen Huang, Jitao Sang · 2026-05-25 04:00

长上下文大语言模型中的位置错误：推理基准测试中的盲点

arXiv:2605.23170v1 Announce Type: cross Abstract: Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-cont…

arXiv cs.AI TIER_1 English(EN) · Ziyue Liu, Zhengyang Wang, Ruijie Zhang, Avinash Maurya, Hui Zhou, Paul Hovland, Sheng Di, Franck Cappello, Bogdan Nicolae, Zheng Zhang · 2026-05-25 04:00

ReCoVer：通过容错的集体和多功能工作负载实现弹性 LLM 预训练系统

arXiv:2605.11215v2 Announce Type: replace-cross Abstract: Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism sc…

arXiv cs.LG TIER_1 English(EN) · Wei Lin, Yining Jiang, Qingyu Song, Qiao Xiang, Hong Xu · 2026-05-25 04:00

AGZO：用于 LLM 微调的激活引导式零阶优化

arXiv:2601.17261v4 Announce Type: replace Abstract: Zeroth-Order (ZO) optimization has emerged as a promising solution for fine-tuning LLMs under strict memory constraints, as it avoids the prohibitive memory cost of storing activations for backpropagation. However, existing ZO m…

arXiv cs.LG TIER_1 English(EN) · Mohammad R. Rezaei, Rahul G. Krishnan · 2026-05-25 04:00

从残差到原因：基于LLM的表格数据机制推断

arXiv:2605.22897v1 Announce Type: new Abstract: A persistent challenge in machine learning for scientific applications is jointly achieving prediction and understanding. Statistical models excel on structured data but operate as black boxes, while existing interpretability method…

arXiv cs.AI TIER_1 English(EN) · Sixing Chen, Ji-An Li, Saner Cakir, Sinan Akcali, Kayla Lee, Marcelo G. Mattar · 2026-05-25 04:00

从LLM推理轨迹中提取搜索树揭示了短视规划

arXiv:2605.06840v5 Announce Type: replace Abstract: Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine plan…

arXiv cs.AI TIER_1 English(EN) · Yiwen Duan, Jing Ye, Xinpei Zhao · 2026-05-25 04:00

ALIVE：通过对抗性学习和指导性语言评估唤醒LLM推理能力

arXiv:2602.05472v2 Announce Type: replace Abstract: The quest for expert-level reasoning in Large Language Models (LLMs) has been hampered by a persistent \textit{reward bottleneck}: traditional reinforcement learning (RL) relies on scalar rewards that are \textbf{costly} to scal…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-25 00:00

梯度冲突时：多目标提示优化在LLM裁判中的失效模式

Multi-objective LLM judge customization using textual gradients faces challenges from gradient dilution and instruction interference that limit optimization effectiveness.

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-24 00:00

NITP：LLM预训练的下一隐式Token预测

Next Implicit Token Prediction enhances language model training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead.

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-23 00:00

通过激活打补丁测量LLM遗忘的深度

A new metric called Unlearning Depth Score (UDS) is introduced to evaluate how thoroughly knowledge has been removed from large language models, addressing limitations of previous methods that could not detect hidden knowledge in internal representations.

arXiv cs.LG TIER_1 English(EN) · Jialin Chen, Aosong Feng, Harshit Verma, Siyi Gu, Haiwen Wang, Ali Maatouk, Yixuan He, Yifeng Gao, Leandros Tassiulas, Rex Ying · 2026-05-22 04:00

通过可验证的预测行动进行推理：面向金融大模型的基于一致性的强化学习

arXiv:2605.21975v1 Announce Type: new Abstract: Financial markets are characterized by extreme non-stationarity, low signal-to-noise ratios, and strong dependence on external information such as news, company fundamentals, and macroeconomic signals. Yet, existing approaches eithe…

arXiv cs.LG TIER_1 English(EN) · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan · 2026-05-22 04:00

面向自进化大语言模型的单向策略优化

arXiv:2605.22156v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency…

arXiv cs.LG TIER_1 English(EN) · Manuel Noah Riesen, Peter Alfred von Niederh\"ausern · 2026-05-22 04:00

强化思维图：RL驱动的LLM自适应提示

arXiv:2605.22195v1 Announce Type: new Abstract: Graph of Thoughts (GoT), a generalized form of recent prompting paradigms for large language models (LLMs), has been shown to be useful for elaborate problem solving. By executing a graph of operations, thoughts of the LLM are struc…

arXiv cs.LG TIER_1 English(EN) · Di He, Songjun Tu, Keyu Wang, Lu Yin, Shiwei Liu · 2026-05-22 04:00

单一学习率不适用于所有情况：LLM 的重尾引导层级学习率

arXiv:2605.22297v1 Announce Type: new Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting …

arXiv cs.LG TIER_1 English(EN) · Athanasios Glentis, Jiaxiang Li, Andi Han, Mingyi Hong · 2026-05-22 04:00

通过极简优化器设计实现内存高效的大模型预训练

arXiv:2506.16659v3 Announce Type: replace Abstract: Training large language models (LLMs) relies on adaptive optimizers such as Adam, which introduce extra operations and require significantly more memory to maintain first- and second-order moments than SGD. While recent works su…

arXiv cs.LG TIER_1 English(EN) · Tom Segal, Asaf Shabtai, Yuval Elovici · 2026-05-22 04:00

可证明地保护微调LLM免遭训练数据提取，同时保持其效用

arXiv:2602.00688v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) on sensitive datasets raises privacy concerns, as training data extraction (TDE) attacks can expose highly confidential information. Existing defenses against such attacks either lack for…

arXiv cs.LG TIER_1 English(EN) · Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal · 2026-05-22 04:00

关于 RL 微调的视觉语言模型（VLM）的鲁棒性和思维链一致性

arXiv:2602.12506v3 Announce Type: replace Abstract: Reinforcement learning (RL) finetuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision-language models (VLMs). While RL-tuned VLMs improve on…

arXiv cs.LG TIER_1 English(EN) · Huilin Zhou, Jian Zhao, Yilu Zhong, Zhen Liang, Xiuyuan Chen, Yuchen Yuan, Tianle Zhang, Chi Zhang, Lan Zhang, Xuelong Li · 2026-05-22 04:00

Metis：通过自适应元认知策略优化学习越狱LLM

arXiv:2605.10067v3 Announce Type: replace Abstract: Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing approaches often rely on static heuristics or stochastic search, rendering them …

arXiv cs.LG TIER_1 English(EN) · Hongbin Zhang, Chaozheng Wang, Kehai Chen, Youcheng Pan, Yang Xiang, Jinpeng Wang, Min Zhang · 2026-05-22 04:00

因材施教：面向LLM推理的方向自适应自蒸馏

arXiv:2605.22263v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token…

arXiv cs.AI TIER_1 English(EN) · Akshay Manglik (Emily), Apaar Shanker (Emily), Kaustubh Deshpande (Emily), Jason Qin (Emily), Yash Maurya (Emily), Veronica Chatrath (Emily), Vijay S. Kalmath (Emily), Levi Lentz (Emily), Yuan (Emily), Xue · 2026-05-22 04:00

洞察生成器：LLM代理的系统性语料库级追踪诊断

arXiv:2605.21347v2 Announce Type: new Abstract: Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does…

arXiv cs.AI TIER_1 English(EN) · Can Hankendi, Rana Shahout, Minlan Yu, Ayse K. Coskun · 2026-05-22 04:00

PALS：面向混合专家模型的功耗感知大模型服务

arXiv:2605.21427v1 Announce Type: new Abstract: Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and…

arXiv cs.AI TIER_1 English(EN) · Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye · 2026-05-22 04:00

并行大语言模型推理，实现偏见弹性、鲁棒的观念抽象

arXiv:2605.20194v1 Announce Type: cross Abstract: Large language models (LLMs) have been increasingly used to analyze text. However, they are often plagued with contextual reasoning limitations when analyzing long documents. When long documents are processed sequentially, early o…

arXiv cs.AI TIER_1 English(EN) · Reese Levine, Rithik Sharma, Nikhil Jain, Abhijit Ramesh, Zheyuan Chen, Neha Abbas, James Contini, Tyler Sorensen · 2026-05-22 04:00

Llamas on the Web: 内存高效、性能便携、多精度大语言模型 WebGPU 推理

arXiv:2605.20706v1 Announce Type: cross Abstract: Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To re…

arXiv cs.AI TIER_1 English(EN) · Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu · 2026-05-22 04:00

Frontier: 迈向全面准确的大模型推理模拟

arXiv:2605.21312v1 Announce Type: cross Abstract: Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simu…

arXiv cs.AI TIER_1 English(EN) · Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye · 2026-05-22 04:00

通用推理器：用于冻结 LLM 的单一、可组合的即插即用推理器

arXiv:2505.19075v3 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficien…

arXiv cs.AI TIER_1 English(EN) · Qizheng Li, Yifei Zhang, Xiao Yang, Xu Yang, Zhuo Wang, Weiqing Liu, Jiang Bian · 2026-05-22 04:00

FT-Dojo：面向语言代理的自主 LLM 微调

arXiv:2603.01712v2 Announce Type: replace Abstract: Fine-tuning large language models for vertical domains remains labor-intensive, requiring practitioners to curate data, configure training, and iteratively diagnose model behavior. Despite growing interest in autonomous machine …

arXiv cs.AI TIER_1 English(EN) · Xian Wu, Kaijie Zhu, Ying Zhang, Lun Wang, Wenbo Guo · 2026-05-22 04:00

rePIRL：通过逆强化学习学习PRM以实现LLM推理

arXiv:2602.07832v2 Announce Type: replace-cross Abstract: Process rewards have been widely used in deep reinforcement learning to improve training efficiency, reduce variance, and prevent reward hacking. In LLM reasoning, existing works also explore various solutions for learning…

arXiv cs.AI TIER_1 English(EN) · Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang · 2026-05-22 04:00

Charon：大规模 LLM 训练和推理的统一细粒度模拟器

arXiv:2605.17164v2 Announce Type: replace-cross Abstract: Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate an…

arXiv cs.AI TIER_1 English(EN) · Zikai Alex Wen · 2026-05-22 04:00

面向LLM智能体技能规范的用户理解支持

arXiv:2605.19362v2 Announce Type: replace-cross Abstract: Users often interpret and select agent skills through their SKILL markdown specifications. To protect users, existing audits mainly focus on malicious or unsafe skills. We study the complementary question of whether specif…

arXiv cs.CL TIER_1 English(EN) · Zhenwei Tang, Zhaoyan Liu, Rasa Hosseinzadeh, Tongzi Wu, Keyvan Golestan, Jesse C. Cresswell · 2026-05-22 04:00

RankJudge：一个多轮LLM作为裁判的合成基准生成器

arXiv:2605.21748v1 Announce Type: new Abstract: As interactive LLM-based applications are created and refined, model developers need to evaluate the quality of generated text along many possible axes. For simpler systems, human evaluation may be practical, but in complicated syst…

arXiv cs.CL TIER_1 English(EN) · Xiaoyuan Li, Yubo Ma, Chengpeng Li, Fengbin Zhu, Yiyao Yu, Keqin Bao, Wenjie Wang, Fuli Feng, Dayiheng Liu · 2026-05-22 04:00

LLM推理的统一数据选择

arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably d…

arXiv cs.CL TIER_1 English(EN) · Arip Asadulaev, Daniil Ognev, Karim Salta, Martin Takac · 2026-05-22 04:00

LLM强化学习的价值梯度假说

arXiv:2605.21654v1 Announce Type: cross Abstract: Reinforcement learning substantially improves pretrained language models, but it remains understudied why critic-free methods such as PPO and GRPO work as well as they do, and when they should provide the largest gains. We develop…

arXiv cs.CL TIER_1 English(EN) · Xing Zhang, Yanwei Cui, Guanghui Wang, Ziyuan Li, Wei Qiu, Bing Zhu, Peiyang He · 2026-05-22 04:00

Ratchet：一种用于自进化 LLM 代理的极简卫生食谱

arXiv:2605.22148v1 Announce Type: cross Abstract: Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while h…

arXiv cs.CL TIER_1 English(EN) · Fengfei Yu, Ruijia Niu, Dongxia Wu, Yian Ma, Rose Yu · 2026-05-22 04:00

使用语义级奖励校准LLM

arXiv:2605.15588v2 Announce Type: replace Abstract: As large language models (LLMs) are deployed in consequential settings such as medical question answering and legal reasoning, the ability to estimate when their outputs are likely to be correct is essential for safe and reliabl…

arXiv cs.CL TIER_1 English(EN) · Alexandre Cristov\~ao Maiorano · 2026-05-22 04:00

LLM 就绪性管理：LLM/RAG 应用的评估、可观测性和 CI 门禁

arXiv:2603.27355v2 Announce Type: replace-cross Abstract: We present a readiness harness for LLM and RAG applications that turns evaluation into a deployment decision workflow. The system combines automated benchmarks, OpenTelemetry observability, and CI quality gates under a min…

arXiv cs.LG TIER_1 English(EN) · Andy Han, Kristina Fujimoto, Avidan Shah, Kiet Nguyen, Kai Xu, Chen Yueh-Han, Ilia Sucholutsky, Rico Angell · 2026-05-22 04:00

On-Policy一致性训练以最小的能力下降提高了LLM的安全性

arXiv:2605.21834v1 Announce Type: new Abstract: Aligned models can misbehave in several ways: they are often sycophantic, fall victim to jailbreaks, or fail to include appropriate safety warnings. Consistency training is a promising new alignment paradigm to mitigate such failure…

arXiv cs.LG TIER_1 English(EN) · Yu Li, Rui Miao, Tian Lan, Zhengling Qi · 2026-05-22 04:00

OPPO：LLM推理中 token 级信用分配的贝叶斯价值递归

arXiv:2605.21851v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal re…

arXiv cs.LG TIER_1 English(EN) · Yifan Lan, Yuanpu Cao, Hanyu Wang, Lu Lin, Jinghui Chen · 2026-05-22 04:00

推理的幻觉：通过零-CoT截断揭示LLM中逃避式数据污染

arXiv:2605.21856v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive reasoning abilities across a wide range of tasks, but data contamination undermines the objective evaluation of these capabilities. This problem is further exacerbated by mal…

arXiv cs.CL TIER_1 English(EN) · Jitao Sang · 2026-05-22 02:42

长上下文大语言模型中的位置错误：推理基准测试中的盲点

Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-context benchmarks and find none jointly controls task…

arXiv cs.CL TIER_1 English(EN) · Dayiheng Liu · 2026-05-21 12:21

LLM推理的统一数据选择

Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably distinguish high- from low-quality reasoning samp…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 08:20

Ratchet：一种用于自演化 LLM 代理的极简卫生食谱

Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while human-curated ones deliver $+16.2$pp: the bottlenec…

arXiv cs.CL TIER_1 English(EN) · Peiyang He · 2026-05-21 08:20

Ratchet：一种用于自演化 LLM 代理的极简卫生配方

Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while human-curated ones deliver $+16.2$pp: the bottlenec…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 00:00

推理的幻觉：通过零-CoT截断揭示LLM中逃避式数据污染

A black-box detection method called Zero-CoT Probe is introduced to identify data contamination in large language models by truncating reasoning processes and comparing performance on original and perturbed datasets.

arXiv cs.CL TIER_1 English(EN) · Yu Meng · 2026-05-20 17:53

仅需极少RLVR训练：通过Rank-1轨迹外推LLMs

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight t…

arXiv cs.AI TIER_1 English(EN) · Ayse K. Coskun · 2026-05-20 17:19

PALS：面向混合专家模型的功耗感知大模型服务

Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a …

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 17:19

PALS：面向混合专家模型的功耗感知大模型服务

Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a …

arXiv cs.AI TIER_1 English(EN) · Xue · 2026-05-20 16:13

洞察生成器：LLM代理的系统性语料库级追踪诊断

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individua…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 16:13

洞察生成器：LLM智能体的系统性语料级追踪诊断

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individua…

arXiv cs.AI TIER_1 English(EN) · Hong Xu · 2026-05-20 15:40

Frontier: 迈向全面准确的LLM推理模拟

Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing de…

arXiv cs.AI TIER_1 English(EN) · Tyler Sorensen · 2026-05-20 05:05

Llamas on the Web: 内存高效、性能便携、多精度大语言模型 WebGPU 推理

Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the W…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 00:00

RankJudge：一个多轮LLM作为裁判的合成基准生成器

A benchmark generator called RankJudge evaluates large language model judges on multi-turn conversations by creating flawed conversation pairs and using statistical models for ranking and difficulty assessment.

arXiv cs.CL TIER_1 English(EN) · Yuzhang Shang · 2026-05-19 17:59

TIDE：具有 I/O 感知专家卸载的高效无损 MoE 扩散 LLM 推理

Diffusion Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive (AR) models, offering better hardware utilization and bidirectional context through parallel block-level decoding. However, as dLLMs continue to scale up with mixture-of-experts (M…

arXiv cs.CL TIER_1 English(EN) · Yinghuan Shi · 2026-05-19 13:44

工具总是受益的吗？学习自适应调用工具以实现双模态多模态大模型推理

Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking too…

arXiv cs.AI TIER_1 English(EN) · Egor Shvetsov · 2026-05-19 12:48

先验知识还是搜索？一项关于硬件感知代码优化的LLM代理研究

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are incre…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 12:48

先验知识还是搜索？一项关于硬件感知代码优化的LLM代理研究

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are incre…

arXiv cs.CL TIER_1 English(EN) · Xuanjing Huang · 2026-05-19 09:40

LLMEval-Logic：一个求解器验证的中文基准，用于 LLM 的对抗性加固逻辑推理

Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require conclusions to follow strictly from stated premises. Many existing logical-reasoning benchmarks are generated by templating natural-language items from s…

arXiv cs.CL TIER_1 English(EN) · Jieping Ye · 2026-05-19 06:42

回溯偏离之处：缓解LLM推理蒸馏中的双重暴露偏差

Large language models (LLMs) have achieved remarkable success in complex reasoning tasks via long chain-of-thought (CoT), yet their immense computational overhead hinders real-world deployment. LLM reasoning distillation addresses this by transferring reasoning capabilities from …

arXiv cs.CL TIER_1 English(EN) · Jitao Sang · 2026-05-19 04:41

驯服思考者：条件熵塑造用于自适应LLM推理

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance …

arXiv cs.CL TIER_1 English(EN) · Hua Wei · 2026-05-19 00:57

通过分步置信度归因诊断黑盒大模型的多步推理失败

Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing me…

arXiv cs.AI TIER_1 English(EN) · Pascal Van Hentenryck · 2026-05-18 17:28

LLM引导的模型补丁实现大规模重优化民主化

Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. However, real-world environments are dynamic, with evolving business rules, previously overlooked constraints, and unforeseen perturbations. In…

arXiv cs.AI TIER_1 English(EN) · Shaowu Pan · 2026-05-18 16:34

SCICONVBENCH：在计算科学任务制定中的多轮澄清基准测试LLM

Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific pr…

arXiv cs.CL TIER_1 English(EN) · Song Guo · 2026-05-18 08:54

KVDrive：面向长上下文大语言模型推理的整体式多层级KV缓存管理系统

Supporting long-context LLMs is challenging due to the substantial memory demands of the key-value (KV) cache. Existing offloading systems store the full cache in host memory and selectively fetch critical entries during decoding, but this strategy quickly hits a ceiling: sparsit…

arXiv cs.CL TIER_1 English(EN) · Maosong Sun · 2026-05-18 07:33

AutoVecCoder：教会大型语言模型生成显式向量化代码

Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-performance computing. To fully exploit hardware potential, developers often resort to explicit vectorization using intrinsics, as compiler-based auto-vectorization frequently yields…

arXiv cs.MA (Multiagent) TIER_1 English(EN) · James Evans · 2026-05-16 23:29

多大型语言模型系统展现出鲁棒的语义崩溃

Whether machines can originate novel content has been debated for nearly two centuries, from Lovelace's assertion that no engine can "originate anything" to Turing's question of whether a machine can amplify ideas brought in from outside. Multi-large language model (LLM) systems,…

arXiv cs.LG TIER_1 English(EN) · Wes Armour · 2026-05-15 17:03

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remov…

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-15 00:00

Rule2DRC：用于 DRC 脚本合成的 LLM Agent 基准测试，并进行执行引导的测试生成

Rule2DRC introduces a large-scale benchmark for DRC script synthesis with 1,000 rule-to-script tasks and 13,921 evaluation layouts, along with SplitTester which improves program selection through execution-based feedback.

arXiv cs.CV TIER_1 English(EN) · Haohuan Fu · 2026-06-30 16:08

参与、转换或静默：面向高效多模态大模型推理的算子级视觉跳过

Multimodal large language models (MLLMs) increasingly process long visual-token sequences, increasing the overall inference computation. Existing acceleration methods usually remove visual tokens or skip visual-token updates in entire layers, but these coarse strategies may disca…

arXiv stat.ML TIER_1 English(EN) · Johannes Zenn, Jonas Geiping · 2026-06-26 04:00

序列概率与大型语言模型正确性：何时可能答案是正确的？

arXiv:2606.27359v1 Announce Type: new Abstract: Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token level or globally at the sequence level. Therefore, thei…

arXiv stat.ML TIER_1 English(EN) · Jonas Geiping · 2026-06-25 17:58

何时可能答案是正确的？关于序列概率和LLM的正确性

Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token level or globally at the sequence level. Therefore, their success depends on a fundamental question: whe…

LessWrong (AI tag) TIER_1 English(EN) · Josh Engels · 2026-06-22 22:26

LLM驱动的功能发现

<p><span>We would often like to get a qualitative sense of a target model’s behaviors in important distributions (e.g. deployment, RL training, or evals). For example, we might want to </span><a href="https://alignment.anthropic.com/2026/petri-v2/"><span>discover novel behaviors<…

arXiv stat.ML TIER_1 English(EN) · Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez · 2026-06-02 04:00

大型语言模型适应性限制：模型内化先验知识对标注任务性能的影响

arXiv:2606.00467v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dim…

arXiv stat.ML TIER_1 English(EN) · Jingkai Huang, Will Ma, Zhengyuan Zhou · 2026-06-02 04:00

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

arXiv:2602.05395v2 Announce Type: replace Abstract: A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to sa…

arXiv cs.CV TIER_1 English(EN) · Hyeonwoo Cho, DongHyeon Baek, Yewon Kim, Bumsub Ham · 2026-06-02 04:00

通过纠正失真来改进视觉令牌缩减，以实现高效的多模态 LLM 推理

arXiv:2606.01711v1 Announce Type: new Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have achieved remarkable success in vision-language tasks, yet the quadratic computational complexity arising from the vast number of visual tokens incurs significant m…

arXiv stat.ML TIER_1 English(EN) · R. Michael Alvarez · 2026-05-30 01:21

大型语言模型适应性限制：模型内化先验知识对标注任务性能的影响

Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's fami…

arXiv stat.ML TIER_1 English(EN) · Jiachun Li, David Simchi-Levi, Will Wei Sun · 2026-05-29 04:00

低秩与秩相关：稀疏成对比较下的不确定性感知任务特定LLM排名

arXiv:2605.29395v1 Announce Type: cross Abstract: Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ran…

arXiv stat.ML TIER_1 English(EN) · Will Wei Sun · 2026-05-28 05:44

低秩与秩相关：稀疏成对比较下的不确定性感知任务特定LLM排名

Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ranking each fine-grained task independently is unsta…

arXiv stat.ML TIER_1 English(EN) · Paula Cordero-Encinar, Georgy Tyukin, Andrew B. Duncan · 2026-05-28 04:00

软性专家：用于不确定性感知大语言模型训练后调优的 $\alpha$-R\'enyi 集

arXiv:2605.27747v1 Announce Type: new Abstract: Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is force…

arXiv stat.ML TIER_1 English(EN) · Shijin Gong, Erhan Xu, Kai Ye, Francesco Quinzan, Giulia Livieri, Chengchun Shi · 2026-05-27 04:00

BASIS：基于单次轨迹信息共享的LLM推理批次优势估计

arXiv:2605.27293v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency…

arXiv stat.ML TIER_1 English(EN) · Andrew B. Duncan · 2026-05-26 22:44

软性专家：用于不确定性感知大语言模型训练后处理的 $α$-Rényi 集成模型

Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent un…

arXiv stat.ML TIER_1 English(EN) · Chengchun Shi · 2026-05-26 17:06

BASIS：基于单次轨迹信息共享的 LLM 推理批次优势估计

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We intro…

arXiv cs.CV TIER_1 English(EN) · Zehao Wang, Yihan Zeng, Zidong Gong, Yuanfan Guo, Feng Zhu, Hongzhi Zhang, Wei Zhang, Wangmeng Zuo · 2026-05-26 04:00

AnE：通过锚点演化推动多模态大语言模型的推理前沿

arXiv:2605.25571v1 Announce Type: new Abstract: Post-training via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is crucial for enhancing reasoning in Multimodal Large Language Models (MLLMs), yet existing paradigms often reach a performance bottleneck due to the li…

arXiv stat.ML TIER_1 English(EN) · Junghyun Lee, Sanghwa Kim, Yassir Jedra, Alexandre Prouti\`ere, Se-Young Yun · 2026-05-25 04:00

在预算有限的情况下，使用多个LLM裁判进行实例最优估计

arXiv:2605.23362v1 Announce Type: cross Abstract: Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can va…

arXiv stat.ML TIER_1 English(EN) · Weijie Su · 2026-05-23 01:18

CurveRL：面向LLM推理的原则性、感知分布的上下文重加权

Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining what constitutes an optimal weighting remains poorl…

arXiv stat.ML TIER_1 English(EN) · Se-Young Yun · 2026-05-22 08:26

在预算有限的情况下，使用多个LLM裁判进行实例最优估计

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can vary substantially. This raises a basic allocation q…

arXiv stat.ML TIER_1 English(EN) · Hamed Khosravi, Xiaoming Huo · 2026-05-21 04:00

Conformal Selective Acting: 针对 RLVR 训练的 LLM 的任何时候都有效的风险控制

arXiv:2605.20270v1 Announce Type: cross Abstract: A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $\alpha$. The operator needs a safety …

arXiv stat.ML TIER_1 English(EN) · J. G. Dai, Tianze Deng, Yueying Li, Tianyi Peng · 2026-05-19 04:00

面向LLM推理和AI代理的吞吐量最优调度算法

arXiv:2504.07347v3 Announce Type: replace Abstract: As demand for Large Language Models (LLMs) and AI agents grows rapidly, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little has been explored …

arXiv stat.ML TIER_1 English(EN) · Xiaoming Huo · 2026-05-18 22:20

Conformal Selective Acting：RLVR训练的LLM的随时有效风险控制

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round…

arXiv stat.ML TIER_1 English(EN) · Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang · 2026-05-18 04:00

优化 LLM 推理：带内存约束的流引导在线调度

arXiv:2504.11320v3 Announce Type: replace-cross Abstract: Large language models now serve millions of users daily, with providers incurring costs exceeding $700,000 per day. Each request requires token-by-token inference, making GPU scheduling central to latency, capacity, and co…

雷峰网 (Leiphone) TIER_1 中文(ZH) · 2026-06-29 06:36

ICML 2026 | 当大模型开始发明自己的语言：如何让LLM用更少的Token完成高强度推理

<h1 style="font-size: 15px; line-height: 1.85; margin: 24px 0 14px; font-weight: 700; color: #111827;">原文作者：公众号“专知”</h1><p>原文链接：<a href="https://mp.weixin.qq.com/s/GYp8zFf-C5pXqHMSDNT2Aw" rel="nofollow" target="_blank">https://mp.weixin.qq.com/s/GYp8zFf-C5pXqHMSDNT2Aw</a> </p><p>…

AWS Machine Learning Blog TIER_1 English(EN) · Sandeep Raveesh-Babu · 2026-05-29 23:36

Amazon SageMaker AI LLM 推理的全面可观测性：从 GPU 利用率到 LLM 质量

This post demonstrates a comprehensive observability solution using Amazon Managed Grafana dashboards that provides a holistic view of both quality and quantity for LLMs served on Amazon SageMaker AI endpoints with inference components.

Databricks Blog TIER_1 English(EN) · 2026-05-27 20:20

大规模可靠的LLM推理

At Databricks, we’ve built a unique inference platform that serves every frontier...

Databricks Blog TIER_1 English(EN) · 2026-05-22 20:00

在 Databricks 上使用 Prompt Caching 为开源模型加速 LLM 推理

Why Prompt Caching MattersLarge language model (LLM) inference often involves repeated...

Together AI blog TIER_1 English(EN) · 2026-04-03 00:00

AI for Systems: 使用LLM优化数据库查询执行

New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.

Together AI blog TIER_1 English(EN) · 2025-06-11 00:00

推出 Together AI 批量 API：以 50% 的成本处理数千个 LLM 请求

Together AI blog TIER_1 English(EN) · 2025-05-28 00:00

Mixture-of-Agents Alignment: Harnessing the Collective Intelligence of Open-Source LLMs to Improve Post-Training

Anyscale blog TIER_1 English(EN) · 2026-06-18 09:00

使用 Ray Serve LLM 实现高性能分布式推理

Learn how Ray Serve LLM + vLLM stack achieves up to 24x higher throughput with direct streaming, HAProxy integration, and a new vLLM Ray executor backend.

Hacker News — AI stories ≥50 points TIER_1 English(EN) · AMavorParker · 2026-05-20 21:11

PopuLoRA：共进化LLM种群以进行推理自我博弈

Medium — fine-tuning tag TIER_1 한국어(KO) · YouShin kim · 2026-07-04 00:57

UCLA 与 Optum AI — LLM 推理模型训练的创新：如何仅通过查看“前 100 个 token”来选择高质量数据

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@mdpman/ucla-optum-ai-llm-%EC%B6%94%EB%A1%A0-%EB%AA%A8%EB%8D%B8-%ED%95%99%EC%8A%B5%EC%9D%98-%ED%98%81%EC%8B%A0-%EC%B2%AB-100%ED%86%A0%ED%81%B0%EB%A7%8C-%EB%B3%B4%EA%B3%A0-%EA%B3%A0%ED%92%88%EC%…

Towards AI TIER_1 English(EN) · Suchitra Malimbada · 2026-07-02 14:31

为什么 4 位权重很容易而 8 位激活会破坏模型：深入 LLM 推理，第 3 部分

<h4><em>A systems-level mental model of quantization, built from the asymmetry that explains every method in the field</em></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kywVQlvTSCtdxy9PH3N6RQ.jpeg" /></figure><p>Quantizing the weights of a large languag…

Medium — MLOps tag TIER_1 English(EN) · Hatemazaiez · 2026-07-02 10:43

我测量了推理并发如何悄无声息地降低LLM推理质量

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@hatemazaiez1/i-measured-how-inference-concurrency-silently-degrades-llm-reasoning-quality-9074189fce5e?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1422/1*TvAiMq5CDCH…

Towards AI TIER_1 English(EN) · Dylan Tartarini · 2026-06-29 17:31

使用 LLMs 构建和查询知识图谱

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://pub.towardsai.net/build-and-query-knowledge-graphs-with-llms-4f39251df792?source=rss----98111c9905da---4"><img src="https://cdn-images-1.medium.com/max/961/0*1_mBtGUfumtQ-LYE.png" width="961" /></a></p><p…

Towards AI TIER_1 English(EN) · Artha Mukherjee · 2026-06-23 22:01

2026年GPU匮乏者的本地LLM推理指南

<h4>MoE math. KV cache quants below q8_0. MCP-based tooling. Worked example: a 35B Mixture-of-Experts on 6 GB of VRAM.</h4><figure><img alt="The 6 GB laptop running, with terminal + chat UI visible" src="https://cdn-images-1.medium.com/max/1024/1*ipxoUODrjtFTxvdbq_KGdQ.gif" /></f…

Medium — MLOps tag TIER_1 English(EN) · Sami · 2026-06-21 14:25

一个数字的谎言：如何实际衡量LLM推理

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@ing.benali.sami/one-number-lies-how-to-actually-measure-llm-inference-0b78e6572a33?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/999/1*xp-ENMt4ONNA0uGwJfe5zA.png" widt…

Medium — fine-tuning tag TIER_1 English(EN) · Jose Miguel Arrieta · 2026-06-20 13:38

LoRA笔记：用更少的参数微调大型模型

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/data-science-hub/lora-notes-fine-tuning-large-models-with-fewer-parameters-756cafd5662a?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1081/1*zfkCTvNdlVksGksN7dOdn…

Medium — MLOps tag TIER_1 English(EN) · Michiel Horstman · 2026-06-19 23:10

模型合并入门：无需训练即可合并大型语言模型

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://michielh.medium.com/model-merging-for-dummies-combine-llms-without-training-7d7173c069bc?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1536/1*GMGyvxframz4wfkIQTOjSw.png" width="15…

Towards AI TIER_1 English(EN) · ChienLoong · 2026-06-16 12:31

推理的清算：如何停止在云端LLM代币上烧钱数百万

<h4>Imagine checking your enterprise cloud billing dashboard on a Monday morning and seeing a sudden, violent $45,000 spike.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XkxO-XoTC2kSonJPHveTrA.png" /><figcaption>Source from Author</figcaption></figure><…

Medium — MLOps tag TIER_1 English(EN) · The_Turingetic_Guy · 2026-06-15 17:27

大规模分布式大模型推理 — 第三部分：推理指标、调度策略及…

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@the_turingetic_guy/large-scale-distributed-llm-inference-part-3-inference-metrics-scheduling-strategies-and-f115e8933b48?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/…

dev.to — MCP tag TIER_1 English(EN) · Gurutva Murdia · 2026-06-11 18:24

推出 Duplex：一款零后端、多路复用的 LLM 推理引擎，实现真正的客户端并行 AI

<p>Hi there. I’m Gurutva Murdia, the developer behind Duplex. Today I’m excited to share the story, architecture, and technical deep dives of a project that’s been consuming my focus for months: a fully decentralised , browser-native wrapper that lets you run multiple Large Langu…

Towards AI TIER_1 English(EN) · Abhinandan Malhotra · 2026-06-10 14:31

在受限硬件上优化本地大模型推理

<h4>An engineering deep dive into KV cache quantization, asymmetric thread tuning, and PCIe bottlenecks</h4><h3><strong>Introduction</strong></h3><p>New frontier models launch weekly, and for most developers, the testing phase abruptly ends when the API bill arrives or the rate l…

Medium — MLOps tag TIER_1 English(EN) · Rayari · 2026-06-08 21:15

那个没人谈论的黑箱：深入 LLM 推理工程

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rayari1729/the-black-box-nobody-talks-about-a-deep-dive-into-llm-inference-engineering-e71dd94f4624?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/2600/1*hNWi-AW7w6OqP2…

Medium — MLOps tag TIER_1 English(EN) · jagesh maharjan · 2026-06-07 15:56

大语言模型训练：五维并行宇宙

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@JugsMa/llm-training-the-5d-parallelism-universe-ff0045b20bd4?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/2600/1*o6tQyOQIPeW6SoiMsefFLw.png" width="2816" /></a></p><p…

Medium — MLOps tag TIER_1 English(EN) · Avishek Jana · 2026-06-04 03:18

理解大语言模型精度——位格式如何影响训练、推理和质量

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://blog.geogo.in/understanding-llm-precision-how-bit-formats-shape-training-inference-and-quality-1cd0550bd717?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/2486/1*QKMWDFTC5jcwe07NPc…

Towards AI TIER_1 English(EN) · Shakti Wadekar · 2026-05-30 05:14

LLM推理的演进：解码算法 — 第一部分

<p>LLM inference optimization can be understood along three major axes: <strong>memory optimization, compute optimization, and decoding algorithms</strong>. Compared to memory and compute optimizations, decoding algorithms are often discussed less, even though they are becoming i…

Medium — MLOps tag TIER_1 English(EN) · The_Turingetic_Guy · 2026-05-24 15:53

大规模分布式大模型推理 — 第一部分

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rtxtdfs/large-scale-distributed-llm-inference-part-1-54343375c2c4?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/798/1*H-GnzHY45Yo7AnuLCpspfw.png" width="798" /></a></p…

Medium — fine-tuning tag TIER_1 English(EN) · Boring Developer · 2026-05-23 11:26

微调LLM：打造AI的个性

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@parthbissa5/fine-tuning-llm-building-personality-of-ai-fa74b8a40c0d?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/600/1*E-guVNJTOIosxYAi2SPstw.jpeg" width="600" …

Medium — fine-tuning tag TIER_1 English(EN) · QuarkAndCode · 2026-05-21 07:48

微调与对齐：领域自适应如何构建专业化大模型

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@QuarkAndCode/fine-tuning-and-alignment-how-domain-adaptation-builds-specialized-llms-7c6d93f66937?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1024/1*D2kcjRNI5S…

Medium — fine-tuning tag TIER_1 English(EN) · QuarkAndCode · 2026-05-18 08:26

为何预训练大语言模型需要微调以提升AI性能

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@QuarkAndCode/why-pretrained-llms-need-fine-tuning-for-better-ai-performance-6541293f9fef?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1024/1*y3FRj0ALAXfwrMOzXPZ…

Medium — MLOps tag TIER_1 English(EN) · Charan Panthangi · 2026-05-18 04:38

推理优化 — 如何在生产环境中使大型语言模型更快、更便宜

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@charan.panthangi/inference-optimization-how-to-make-llms-faster-and-cheaper-in-production-2778cd00d921?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1200/1*tyCL0_ikRhY…

dev.to — LLM tag TIER_1 English(EN) · Mudassir Khan · 2026-07-01 19:54

LLM成本优化：在不牺牲质量的情况下削减推理账单

<p>You can cut your LLM API spend by 50 to 90% without switching models or degrading output quality. The techniques exist, the docs are public, and most teams are not using them. Here is what actually moves the needle.</p> <h2> Where your LLM bill actually comes from </h2> <p>Eve…

dev.to — LLM tag TIER_1 English(EN) · Klinsmann R · 2026-07-01 09:51

理解大型语言模型的工作原理：从文本到标记、嵌入、Transformer及预测

<p>Artificial Intelligence is nothing new. It has been around since the early days of computing and has slowly evolved over time. But today, where we stand with Generative AI, or GenAI, it has become one of the most popular and widely adopted categories of advanced AI.<br /> At t…

dev.to — LLM tag TIER_1 English(EN) · Vladyslav Donchenko · 2026-06-29 07:05

"大语言模型推理优化：决定你的AI能否发布的条目"

<p>Training gets the headlines. Inference gets the bill. If you run LLMs in production, inference is almost certainly your biggest AI line item — a meter running 24/7 on every request. The gap between naive and optimized serving is routinely <strong>5-10x in cost and 3-5x in late…

dev.to — LLM tag TIER_1 English(EN) · Etrit Neziri · 2026-06-28 21:04

LLM函数调用：构建AI工具的完整指南

<h1> LLM Function Calling: The Complete Guide for Building AI Tools </h1> <p>Function calling (tool use) is the technology that turned LLMs from chatbots into agents. Here's the complete guide.</p> <h2> What Is Function Calling? </h2> <p>Function calling lets an LLM <strong>decid…

dev.to — LLM tag TIER_1 English(EN) · devtocash · 2026-06-27 20:21

Kubernetes LLM 推理：2026 年部署和扩展开源 LLM

<p>Running your own LLMs on Kubernetes isn't just a cost play — it's about latency, data sovereignty, and fine-tuning control. But GPU scheduling at scale is a different beast entirely.</p> <p>Here's what a production K8s LLM inference stack looks like in 2026: vLLM or TGI for th…

dev.to — LLM tag TIER_1 English(EN) · pueding · 2026-06-27 11:21

OpenAI与博通的Jalapeño，一款定制推理ASIC：推理ASIC vs GPU

<p> </p> <p><strong>What:</strong> The <strong>OpenAI and Broadcom Jalapeño announcement</strong> (June 24, 2026) is OpenAI's <strong>first custom LLM-inference ASIC</strong> — a reticle-sized compute chiplet paired with HBM, built to <strong>run</strong> models rather than train…

dev.to — LLM tag TIER_1 English(EN) · Lycore Development · 2026-06-27 10:19

结构化输出：我们如何停止手动解析LLM响应

<p>Every team we talk to has a version of the same story. They built an LLM integration that works well in testing. Then, three weeks into production, something comes back slightly different — the model wraps the JSON in a code block, or uses <code>"status": "Completed"</code> in…

dev.to — LLM tag TIER_1 English(EN) · arya · 2026-06-27 00:08

从头开始构建LLM推理引擎教会我的编译器设计知识

<p>the insight that started this project hit me while i was finishing a bytecode-compiled language i'd written in C</p> <p>i'd spent months building a hand-written lexer, a single-pass Pratt compiler, a stack VM with 35 opcodes, and a mark-and-sweep garbage collector. and right n…

dev.to — LLM tag TIER_1 English(EN) · Kuldeep Paul · 2026-06-26 18:42

2026年最佳LLM语义缓存工具指南

<p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Foxxdppg4ygvpqbzrm0i4.png"><img alt="A Guide to the B…

dev.to — LLM tag TIER_1 English(EN) · Eric-Octavian · 2026-06-25 17:55

在内核中训练LLM — IONA AI 如何在不使用云的情况下进行嵌入、RAG 和微调

<p>Most AI systems today are cloud‑based. You send a prompt to an API, and a model somewhere else generates a response. You don't control the model. You don't control the data. You don't control the infrastructure.</p> <p>IONA AI is the opposite.</p> <p>It runs inside the kernel …

dev.to — LLM tag TIER_1 English(EN) · zeromathai · 2026-06-25 14:15

为什么 KV 缓存很重要 — MQA、GQA 和 MLA 如何让 LLM 推理更快

<p>LLMs generate text one token at a time.</p> <p>That sounds simple.</p> <p>But without KV Cache, every new token would repeat a lot of old work.</p> <p>That is why inference optimization starts with keys and values.</p> <h2> Core Idea </h2> <p>KV Cache stores previously compute…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-25 12:18

构建多语言AI：LLM数据集最佳实践

<p>Artificial intelligence has transformed the way businesses communicate, automate processes, and provide personalized customer experiences. As businesses grow to global markets, AI systems need to understand and produce content in many languages while maintaining cultural and r…

dev.to — LLM tag TIER_1 English(EN) · ironbyte-rgb · 2026-06-24 19:00

标准GPU上的实时LLM推理：每请求3k token/秒

<h2> TL;DR </h2> <ul> <li>Real-time LLM inference on standard GPUs can reach 3k tokens/s per request</li> <li>Optimizing the whole software stack with architecture/engine/kernel co-design is crucial for fast inference</li> <li>Standard datacenter GPU hardware has a higher decodin…

r/LocalLLaMA TIER_1 English(EN) · /u/z_latent · 2026-06-24 14:22

OpenAI与博通公司发布LLM优化推理芯片

<div class="md"><p><a href="https://openai.com/index/openai-broadcom-jalapeno-inference-chip/">https://openai.com/index/openai-broadcom-jalapeno-inference-chip/</a></p> <p>Quoted from the start of the blog post:</p> <ul> <li>Early testing shows that the first-gener…

dev.to — LLM tag TIER_1 English(EN) · Ashwin Giridharan · 2026-06-24 06:36

我构建了一个交互式11章指南，介绍LLM推理的实际工作原理

<p>Production vLLM is 100,000+ lines of C++, CUDA, and Python. It powers most of the industry's LLM serving — but reading it cold is brutal.</p> <p>So I built a study series around <strong>nano-vLLM</strong>, an open-source reimplementation of vLLM's core ideas in ~1,200 lines of…

dev.to — LLM tag TIER_1 English(EN) · Manoj Krishna Mohan · 2026-06-23 05:43

我构建了一个 Rust 熵监视器来路由 LLM 推理 — 基准测试结果显示如下

<p>Frontier LLM inference is expensive. I wanted to see how far a 4B local model could go before needing a cloud call — and when the cloud call actually adds value.</p> <p>The result is Buddy System: a tiered inference architecture where a Rust entropy monitor watches per-token u…

dev.to — LLM tag TIER_1 English(EN) · Zhongkai Fu · 2026-06-22 17:09

TensorSharp: .NET 原生开源本地 LLM 推理引擎

<p><a href="https://github.com/zhongkaifu/TensorSharp" rel="noopener noreferrer">TensorSharp</a><br /> I would like to share my latest open source .net native local LLM inference engine and applications. It supports many models, like Gemma4, DiffusionGemma, Qwen3.6 with multi-mod…

r/LocalLLaMA TIER_1 English(EN) · /u/carteakey · 2026-06-21 23:01

本地大模型推理优化：完整指南

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1uc3wg9/local_llm_inference_optimization_the_complete/"> <img alt="Local LLM Inference Optimization: The Complete Guide" src="https://external-preview.redd.it/s3zETEijR5VlGEv8jnAYlpIUtOJGtoxTXyjh8AaO6a0.png?wi…

r/MachineLearning TIER_1 English(EN) · /u/YouFirst295 · 2026-06-20 12:27

大规模 LLM 推理开放手册（GPU 内部原理、KV 缓存、批处理、vLLM/SGLang/TensorRT-LLM）[P]

<div class="md"><p>I've been working through the internals of LLM inference and writing up what I learn as an open, in-progress handbook.</p> <p>Just wrapped another chapter on GPU execution and memory internals: why a GPU sits mostly idle during inference, how the…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-15 11:55

通过高级LLM数据集策略掌握AI性能

<p>Artificial intelligence is changing the way businesses operate, innovate, and engage customers. From intelligent virtual assistants to content generation tools, predictive analytics, and enterprise automation, AI has become a catalyst for digital transformation. These developm…

dev.to — LLM tag TIER_1 English(EN) · HelperX · 2026-06-15 05:21

LLM成本优化：我们将回复生成成本从0.011美元降至0.0009美元

<p>When we shipped the first version of AI-generated replies for <a href="https://helperx.app" rel="noopener noreferrer">HelperX</a>, each reply cost us about $0.011 in API spend. That sounds tiny until you multiply by 30 replies per slot per day times 200 active slots: roughly $…

dev.to — LLM tag TIER_1 English(EN) · Nolan Vale · 2026-06-12 17:33

Token成本优化：如何在不牺牲质量的情况下削减LLM推理支出

<p>There is a version of token cost optimization that I do not recommend: cutting token counts by reducing the quality of your system prompt, your retrieved context, or your response formatting. This approach reduces cost and reduces quality in equal measure. You have not optimiz…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-12 12:22

高保真大语言模型训练数据集在现代机器学习中的作用

<p>Large Language Models (LLMs) have revolutionized artificial intelligence by enabling machines to seamlessly generate text, answer complex queries, and translate languages; however, the true catalyst behind these capabilities is high-fidelity training data. As organizations rap…

dev.to — LLM tag TIER_1 English(EN) · BAOFUFAN · 2026-06-10 12:06

测试LLM长期记忆的陷阱：一次3天的调试史诗

<p>I was jolted awake at 2 a.m. by a PagerDuty alert — users were complaining that the AI “called me Mr. Wang yesterday, but today it doesn’t recognise me at all.” Groggily I pulled up the monitoring dashboards and saw that the vector database’s retrieval latency had spiked, and …

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-10 11:34

扩展生成式AI：LLM数据集策展与标注的最佳实践

<p>Generative AI has revolutionized industries by allowing machines to generate human-like text, images, audio, and code. Any successful Large Language Model (LLM) relies on high-quality data as its bedrock. As organizations accelerate their AI initiatives, effective dataset cura…

dev.to — LLM tag TIER_1 English(EN) · Manoranjan Rajguru · 2026-06-10 05:10

小米如何在一万亿参数模型上实现每秒1000个Token：LLM推理优化的深度解析

<blockquote> <p><strong>Meta Description:</strong> Xiaomi's MiMo-V2.5-Pro-UltraSpeed just shattered the 1,000 tokens/second barrier on a 1T-parameter model using commodity GPUs. This deep dive unpacks the FP4 quantization, DFlash speculative decoding, and TileRT persistent engine…

dev.to — LLM tag TIER_1 English(EN) · Kotcherla Murali Krishna · 2026-06-09 02:26

PagedAttention 对比传统 KV 缓存：vLLM 如何为 LLM 推理重塑 GPU 内存

<p>A deep dive into memory fragmentation, paged memory management, and why PagedAttention can deliver up to 24× higher throughput than conventional KV cache implementations.</p> <p>Every token you generate during LLM inference silently eats GPU memory. With traditional KV caching…

dev.to — LLM tag TIER_1 Norsk(NO) · ItsEvilDuck · 2026-06-08 19:28

快速LLM Token计数器：估算GPT模型的Token数量

<p>Today I'm sharing a new utility, the Fast LLM Token Counter. This tool is built to provide quick token count estimations for any given text input.</p> <p>It uses OpenAI's <code>tiktoken</code> library, which is the same method OpenAI uses. This allows for accurate predictions …

dev.to — LLM tag TIER_1 English(EN) · Abhinav Tripathi · 2026-06-08 12:38

从零开始在Strix Halo上训练一个10亿参数模型来学习LLM

<p>About 1 year ago, AMD released their <a href="https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-plus-395.html" rel="noopener noreferrer">AI Max+ series CPUs</a> (aka <code>Strix Halo</code>). It seemed that all of my youtube feed was filled…

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-08 12:23

生成式AI背后的隐藏力量：LLM训练数据集

<p>Generative AI has transformed the way we create content, automate workflows, and interact with technology. From writing articles and generating code to creating realistic images and answering complex questions, Large Language Models (LLMs) are powering a new era of artificial …

dev.to — LLM tag TIER_1 English(EN) · soy · 2026-06-07 21:33

新的 `llama.cpp` 更新、适用于任何 LLM 的 AI 代理以及用于本地推理的量化向量索引

<h2> New <code>llama.cpp</code> Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference </h2> <h3> Today's Highlights </h3> <p>Today's top stories highlight advancements in efficient local AI, starting with core <code>llama.cpp</code> updates for faster LLM…

r/LocalLLaMA TIER_1 English(EN) · /u/LMTLS5 · 2026-06-06 18:12

为LLM的零阶优化构建了专注于推理的库。基于GGML构建。前向传播速度提升39倍，一个MeZo步骤速度提升15倍。[P]

<div class="md"><p></p> <p>i felt like zero order optimization in pytotch was needlessly slow and tough. i am working on zero order optimization so i built this. mostly vibe coded but design choises were mine and yes i read every single line of code before …

dev.to — LLM tag TIER_1 한국어(KO) · HyunSeok Jeong · 2026-06-06 04:44

LLM + CTR 预测的 100 条广告文案 - 运营人员的 4 步工作流程

<blockquote> <p>"이번 캠페인 카피 30개만 더 뽑아주세요" — 마케터의 단골 주문이었던 이 한 줄이, GPT/Claude 등장 이후 의미가 달라졌어요. 이제 100개도 5분이면 나옵니다. 그런데 정작 광고 매니저에 100개를 다 태우면 학습 분산이 깨지고, 비슷한 카피끼리 서로 잠식해서 결과가 망가져요. 이 글은 LLM으로 양산한 카피를 <strong>중복 제거 → 사전 스코어링 → A/B 후보 선별</strong>까지 가는 운영자용 4단계 파이프라인입니다.</p> </blockqu…

r/LocalLLaMA TIER_1 English(EN) · /u/Sisuuu · 2026-06-04 15:02

Qwen3.6-27B on 2x3090s: llama.cpp vs vLLM, all the flags, and the MTP acceptance/inference speed/context

<div class="md"><h1>written 20%-ish by me and 80% by Claude code</h1> <p>Spent basically a whole day getting my box to run Qwen3.6-27B as one OpenAI-compatible endpoint that hot-swaps between four quant/backend combos (llama.cpp Q6_K and Q8_0, vLLM INT4 and INT8). …

dev.to — LLM tag TIER_1 English(EN) · soy · 2026-06-02 21:33

本地大模型进展：Holo3.1 Agents、Headroom Token压缩与Open-LLM-VTuber用于本地推理

<h2> Local LLM Advances: Holo3.1 Agents, Headroom Token Compression & Open-LLM-VTuber for Local Inference </h2> <h3> Today's Highlights </h3> <p>This week's top stories highlight practical tools and techniques for enhancing local LLM performance and deployment, from efficient…

dev.to — LLM tag TIER_1 English(EN) · No One · 2026-06-02 19:00

2026年LLM推理的请求式定价与Token式定价对比

<p>By 2026, the default assumption for LLM inference pricing is still token-based billing. You count input tokens, output tokens, and occasionally tokens spilled across tool calls or retrieval context. For short prompts this feels manageable, but as context windows stretch into t…

r/LocalLLaMA TIER_1 English(EN) · /u/yogthos · 2026-06-02 17:28

将代码置于显微镜下：用于 LLM 的基于小波的上下文

  submitted by   <a href="https://www.reddit.com/user/yogthos"> /u/yogthos </a> <br /> <span><a href="https://yogthos.net/posts/2026-06-02-wavescope.html">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1tuxwhs/putting_code_under_a_micr…

dev.to — LLM tag TIER_1 Русский(RU) · Promptra Team · 2026-06-01 19:17

2026年五大LLM模型对比：价格、基准测试、实际应用

<p>Если в 2024 году рынок LLM-API ещё можно было назвать «дуополией OpenAI + Anthropic с догоняющим Google», то к маю 2026 ландшафт расщепился на четыре чёткие лиги: премиум-reasoning (Claude Opus 4.7, GPT-5.5), value-tier с длинным контекстом (Claude Sonnet 4.6, Gemini 3 Pro), a…

dev.to — LLM tag TIER_1 (CA) · TildAlice · 2026-06-01 15:04

大语言模型分词：GPT 对比 Claude 对比 Llama 的边缘案例

<h2> The 🤗 Emoji Cost Me $47 in API Calls </h2> <p>I ran a batch job that sent 10,000 user-generated messages to GPT-4. The average message was about 200 characters. I budgeted for ~50 tokens per message based on the "~4 characters per token" rule everyone quotes.</p> <p>Actual c…

dev.to — LLM tag TIER_1 English(EN) · Becomer.net · 2026-06-01 14:35

我如何为大型语言模型构建了一个零 token 记忆层（以及为什么它优于向量存储方法）

<p>If you've built an AI chatbot or agent, you've hit the same problem: the LLM forgets everything between sessions. The standard solution is to stuff your conversation history into a vector store and retrieve relevant chunks before each call. It works — but it has a hidden cost.…

dev.to — LLM tag TIER_1 English(EN) · Samir Yuja · 2026-06-01 13:17

Futbol Report — 在 AWS Lambda 上构建多模型 LLM 比较

<p><em>Originally posted at <a href="https://samiryuja.dev/blog/futbol-report-multi-model-eval" rel="noopener noreferrer">samiryuja.dev</a>.</em></p> <p>A few months ago I set up a soccer-digest bot that sends me a Telegram message every few days with fixtures, results, transfer …

dev.to — LLM tag TIER_1 English(EN) · globose technology solutions · 2026-06-01 10:45

使用高质量LLM数据集赋能下一代人工智能

<p><strong>Introduction</strong><br /> Artificial Intelligence (AI) is rapidly transforming industries by enabling machines to understand, process, and generate human-like language. At the heart of this transformation are Large Language Models (LLMs), which power applications suc…

r/LocalLLaMA TIER_1 English(EN) · /u/Thrumpwart · 2026-05-31 23:14

语义步预测：通过步采样在 LLM 推理轨迹中进行多步潜在预测

  submitted by   <a href="https://www.reddit.com/user/Thrumpwart"> /u/Thrumpwart </a> <br /> <span><a href="https://arxiv.org/abs/2604.18464">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1ttalm9/semantic_step_prediction_multistep_lat…

dev.to — LLM tag TIER_1 English(EN) · Ayi NEDJIMI · 2026-05-30 10:03

Pydantic AI 对比 LangChain 对比 instructor：结构化 LLM 输出比较

<p>Getting structured data out of a language model reliably is harder than it looks. The model might return JSON that's almost valid, skip required fields, or wrap the object in a markdown block. Three Python libraries try to solve this differently: <strong>instructor</strong>, <…

dev.to — LLM tag TIER_1 Français(FR) · Paul SANTUS · 2026-05-29 12:41

使用LLM生成结构化数据：提高可靠性的几个技巧

<p>Les LLMs sont excellents pour générer du texte. Ils sont mauvais pour générer des données structurées de manière fiable. Si vous avez déjà essayé de faire produire à un agent un objet JSON avec un schéma précis, vous connaissez le douloureux résultat : champs manquants, clés h…

r/MachineLearning TIER_1 English(EN) · /u/averne_ · 2026-05-29 08:54

为 AMD MI300X 上的 LLM 推理构建单内核 - 每个请求的输出令牌高达 3,300 个/秒 [P]

<div class="md"><p>We built a monokernel that runs the full decode sequence as one GPU-resident program on AMD MI300X, with some neat optimizations. The die topology is central to the result, we map memory access patterns to the physical layout, compute units group…

dev.to — LLM tag TIER_1 English(EN) · synthorai · 2026-05-27 15:30

LLM 提示缓存：2026 全指南

<p>If you ship a chatbot, a RAG app, or an AI agent against a large language model, prompt caching is the single optimization that gives you back <strong>50–90% of input cost and 3–10× of time-to-first-token</strong> at no quality cost. It isn't a bolt-on trick — it falls directl…

Mastodon — fosstodon.org TIER_1 Русский(RU) · [email protected] · 2026-05-27 07:22

扩展大型语言模型：从单个芯片到数据中心。第三章。Transformer。这是关于扩展大型语言模型训练和推理系列文章的续篇。前言

[Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 3. Траснформеры Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая статья А теперь перейдем к чему-то более практическому, а именно к тому, сколько нужно FLOPs и байт для работы трансф…

链接 habr.com/…/1039208

dev.to — LLM tag TIER_1 English(EN) · Quratulain Nayeem · 2026-05-26 16:46

超越提示词：构建一个具有手术式自修复功能的四阶段 LLM 编译器

<p>A single prompt often yields inconsistent, unvalidated AI output. To fix this, I built <strong>Compyl</strong> a multi-stage LLM compiler that inputs english words converting them into directly usable JSON blueprint. </p> <p>Compyl converts plain English into a complete, valid…

dev.to — LLM tag TIER_1 English(EN) · pixelbank dev · 2026-05-25 23:10

LLM 的应用 — 深度解析 + 问题：信息增益

<p><em>A daily deep dive into llm topics, coding problems, and platform features from <a href="https://pixelbank.dev" rel="noopener noreferrer">PixelBank</a>.</em></p> <h2> Topic Deep Dive: Applications of LLMs </h2> <p><em>From the Introduction to LLMs chapter</em></p> <h2> Intr…

dev.to — LLM tag TIER_1 English(EN) · David Moores · 2026-05-25 18:33

LLM结构化输出基准测试

<blockquote> <p>Cross-posted from <a href="https://carrick.tools/blog/benchmarking-llm-structured-outputs/" rel="noopener noreferrer">carrick.tools</a>.</p> </blockquote> <p>When you read the API documentation for OpenAI, Anthropic, or Google Gemini, the feature called "structure…

dev.to — LLM tag TIER_1 English(EN) · Mustafa ERBAY · 2026-05-22 16:11

LLM推理缓存：如何平衡成本与延迟？

<h2> Introduction to LLM Inference Caching: Why It Matters? </h2> <p>When working with Large Language Models (LLMs), especially as you start using them in production environments, one of the first major challenges you'll face is the delicate balance between cost and latency. LLMs…

dev.to — LLM tag TIER_1 English(EN) · Nishkarsh Sahu · 2026-05-19 18:33

为本地和托管 LLM 构建原生 Rails AI 抽象层

<p>Recently I’ve been experimenting with integrating local AI runtimes into Rails applications using tools like Ollama and LM Studio.</p> <p>At first, the integration looked straightforward:<br /> make an HTTP request, stream the response, and return the generated text.</p> <p>Bu…

dev.to — LLM tag TIER_1 English(EN) · Kotcherla Murali Krishna · 2026-05-19 17:43

模块化LLM推理引擎从零开始构建

<p>Why vLLM, TensorRT-LLM, and llama.cpp each solve only part of the problem — and how I built inferx to fill the gap. Runs on any laptop, no GPU needed.</p> <p>I spent the last few months building inferx — an open-source LLM inference optimization library that runs on any machin…

Mastodon — mastodon.social TIER_1 Deutsch(DE) · aisyndicate · 2026-06-03 14:30

大语言模型推理、量化与本地AI：质量真正丢失之处量化看似无害，但翻转显示：相同的准确度可能意味着不同的含义

LLM-Inferenz, Quantisierung und lokale KI: Wo Qualität wirklich verloren geht Quantisierung wirkt oft harmlos, doch Flips zeigen: Gleiche Accuracy kann anderes Verhalten verdecken. Für lokale KI zählt Drift mehr als Benchmarks. https:// aisyndicate.ch/llm-inferenz-qu antisierung-…

链接 aisyndicate.ch/llm-inferenz-quantisierung…

Mastodon — mastodon.social TIER_1 Deutsch(DE) · [email protected] · 2026-05-30 16:04

RT @PavloMolchanov: 🚀 自我推测使 SGLang 推理 LLM 生成实现 6.75 倍实时加速！更多信息请访问 Arint.info # AI # Di

RT @PavloMolchanov: 🚀 Selbst-Spekulation ermöglicht eine 6,75-fache echte Beschleunigung der LLM-Generierung mit SGLang-Inference! mehr auf Arint.info # AI # Diffusion # LLM # MachineLearning # Nemotron # SGLang # arint_info https://x.com/PavloMolchanov/status/2060245957254824246…

Mastodon — mastodon.social TIER_1 Deutsch(DE) · [email protected] · 2026-05-29 16:03

RT @PavloMolchanov: 🚀 自我推断使 SGLang 推理 LLM 生成速度真实加速 6.75 倍！更多信息请访问 Arint.info # AI # Dee

RT @PavloMolchanov: 🚀 Selbst-Spekulation ermöglicht eine 6,75-fache reale Beschleunigung der LLM-Generierung mit SGLang-Inferenz! mehr auf Arint.info # AI # DeepLearning # Innovation # LLM # MachineLearning # NLP # arint_info https://x.com/PavloMolchanov/status/206024595725482424…

Mastodon — mastodon.social TIER_1 Русский(RU) · [email protected] · 2026-05-25 07:22

大语言模型扩展：从单个芯片到数据中心。第二章。分片这是关于扩展大语言模型训练和推理系列文章的续篇。上一章

[Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 2. Шардинг Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая глава находится по этой ссылке . Итак, с основами разобрались, давайте теперь разбираться с тем, как распихать матрицы по …

链接 habr.com/…/1037918

r/singularity TIER_2 English(EN) · /u/yogthos · 2026-06-24 19:07

DualPath：打破智能体LLM推理中的存储带宽瓶颈

<table> <tr><td> <a href="https://www.reddit.com/r/singularity/comments/1uemrgy/dualpath_breaking_the_storage_bandwidth/"> <img alt="DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference" src="https://external-preview.redd.it/q3evP6JeDpAC2MdSQHWYxnCYTqbJkEl…

r/singularity TIER_2 English(EN) · /u/Distinct-Question-16 · 2026-06-24 14:07

OpenAI与博通公司发布LLM优化推理芯片

<div class="md"><p>“We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hard…

报道来源 [564]