PulseAugur
实时 11:05:31

New simulators and frameworks enhance LLM training, inference, and fine-tuning

Researchers have developed several new tools and frameworks to improve the efficiency and accuracy of large language model (LLM) operations. Charon and Frontier are simulators designed to predict LLM training and inference performance with high accuracy, aiding in optimization efforts. FT-Dojo provides a benchmark environment for autonomous LLM fine-tuning, while rePIRL offers an inverse RL-inspired framework for learning process reward models. Additionally, PALS focuses on power-aware LLM serving for Mixture-of-Experts models, and LlamaWeb enables memory-efficient LLM inference in web browsers using WebGPU. AI

影响 New simulators and frameworks promise more efficient, accurate, and power-aware LLM operations, potentially accelerating research and deployment.

排序理由 Multiple research papers introducing new simulators, frameworks, and techniques for LLM training, inference, and fine-tuning.

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 114 个来源。 我们如何撰写摘要 →

New simulators and frameworks enhance LLM training, inference, and fine-tuning

报道来源 [114]

  1. arXiv cs.LG TIER_1 English(EN) · Enayat Ullah, Sai Aparna Aketi, Devansh Gupta, Huanyu Zhang, Meisam Razaviyayn ·

    Efficient DP-SGD for LLMs with Randomized Clipping

    arXiv:2605.24879v1 Announce Type: new Abstract: Large language models (LLMs) are trained on vast datasets that may contain sensitive information. Differential privacy (DP), the de facto standard for formal privacy guarantees, provides a principled framework for training LLMs with…

  2. arXiv cs.CL TIER_1 English(EN) · Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao, Jia Liu, KunLong Chen, Zhiqiang Zhang, Jun Zhou ·

    PowLU: An Activation Function for Stable Pre-Training of LLMs

    arXiv:2605.25704v1 Announce Type: new Abstract: In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates th…

  3. arXiv cs.CL TIER_1 English(EN) · Xiangdong Zhang, Debing Zhang, Shaofeng Zhang, Xiaohan Qin, Yu Cheng, Junchi Yan ·

    NITP: Next Implicit Token Prediction for LLM Pre-training

    arXiv:2605.24956v1 Announce Type: new Abstract: Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained, allowi…

  4. arXiv cs.AI TIER_1 English(EN) · Siyuan Liu, Tinghong Chen, Xinghan Li, Yifei Wang, Jingzhao Zhang ·

    Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

    arXiv:2605.12906v2 Announce Type: replace-cross Abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity,…

  5. arXiv cs.AI TIER_1 English(EN) · Ruishuo Chen, Yu Chen, Zhuoran Li, Longbo Huang ·

    PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

    arXiv:2603.18363v2 Announce Type: replace-cross Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current met…

  6. arXiv cs.AI TIER_1 English(EN) · Haojie Ouyang, Jianwei Lv, Lei Ren, Chen Wei, Xiaojie Wang, Fangxiang Feng ·

    ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

    arXiv:2510.02361v2 Announce Type: replace-cross Abstract: Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention's quadratic complexity with input tokens. Recently, researcher…

  7. arXiv cs.AI TIER_1 English(EN) · Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin, Varun Parekh, Vijaykrishnan Narayanan, Rui Zhang ·

    TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

    arXiv:2605.25850v1 Announce Type: cross Abstract: This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a …

  8. arXiv cs.AI TIER_1 English(EN) · Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang ·

    AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

    arXiv:2605.25658v1 Announce Type: cross Abstract: Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling e…

  9. arXiv cs.AI TIER_1 English(EN) · Xiangtian Ji, Yuxin Chen, Zhengzhou Cai, Xiang Wang, An Zhang, Tat-Seng Chua ·

    Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts

    arXiv:2605.24846v1 Announce Type: cross Abstract: Large language models (LLMs) display strong comprehensive abilities, yet the internal mechanisms that support these behaviors remain insufficiently understood. In this work, we show that across a wide range of open-weight Transfor…

  10. arXiv cs.AI TIER_1 English(EN) · Jaeung Lee, Dohyun Kim, Jaemin Jo ·

    Measuring the Depth of LLM Unlearning via Activation Patching

    arXiv:2605.24614v1 Announce Type: cross Abstract: Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail …

  11. arXiv cs.AI TIER_1 English(EN) · Haizhou Xia ·

    Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning

    arXiv:2605.24613v1 Announce Type: cross Abstract: Post-hoc repair of LLM mathematical reasoning introduces an asymmetric risk: fixing an incorrect reasoning trace is useful, but replacing a trace that was already correct can be harmful. We study this problem under a selective rep…

  12. arXiv cs.AI TIER_1 English(EN) · Jo\~ao Sedoc, Baotong Zhang, Dean Foster ·

    Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction

    arXiv:2605.25133v1 Announce Type: new Abstract: Reliably knowing when a language model is correct is almost as important as being correct. We introduce prover-verifier deliberation (PVD), an inference-time protocol grounded in interactive proof theory, as a mechanism for selectiv…

  13. arXiv cs.AI TIER_1 English(EN) · Jingchu Gai, Guanning Zeng, Christina Baek, Chen Wu, J. Zico Kolter, Andrej Risteski, Aditi Raghunathan ·

    Understanding and Mitigating Premature Confidence for Better LLM Reasoning

    arXiv:2605.24396v1 Announce Type: new Abstract: Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving reasoning quality directly would require process reward…

  14. arXiv cs.AI TIER_1 English(EN) · Ashok Chandrasekar, Jason Kramberger ·

    Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks

    arXiv:2605.24217v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from research environments to production deployments, evaluating their performance against strict Service Level Objectives (SLOs) has become critical. However, current evaluation methodolog…

  15. arXiv cs.AI TIER_1 English(EN) · Minwei Kong, Chonghe Jiang, Ao Qu, Wenbin Ouyang, Zhaoming Zeng, Xiaotong Guo, Zhekai Li, Junyi Li, Yi Fan, Xinshou Zheng, Xi Jing, Yikai Zhang, Zhiwei Liang, Seonghoo Kim, Runqing Yang, Zijian Zhou, Sirui Li, Han Zheng, Wangyang Ying, Ou Zheng, Chonghua… ·

    FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

    arXiv:2605.25246v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research and optimization problems often require a harder capability: designing scalable algorithms th…

  16. arXiv cs.LG TIER_1 English(EN) · Haoyu Zheng, Yongqiang Zhang, Fangcheng Fu, Xiaokai Zhou, Hao Luo, Hongchao Zhu, Yuanyuan Zhu, Hao Wang, Xiao Yan, Jiawei Jiang ·

    Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions

    arXiv:2604.00499v2 Announce Type: replace Abstract: To schedule LLM inference, the \textit{shortest job first} (SJF) principle is favorable by prioritizing requests with short output lengths to avoid head-of-line (HOL) blocking. Existing methods usually predict a single output le…

  17. arXiv cs.LG TIER_1 English(EN) · Daniel Barley, Jonathan Leis, Benjamin Klenk, Holger Fr\"oning ·

    A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training

    arXiv:2605.24006v1 Announce Type: cross Abstract: Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose …

  18. arXiv cs.LG TIER_1 English(EN) · Zili Zhang, Chengxu Yang, Shenglong Zhang, Chenyu Wang, Yufan Zhang, Tuo Dai, Zhouyang Li, Yuhong Ge, Chao Jin, Xin Jin, Yuliang Liu ·

    BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

    arXiv:2605.25451v1 Announce Type: new Abstract: Training multimodal large language models (MLLMs) is challenged by both model and data heterogeneity. Existing systems redesign the training pipeline to address these challenges, but remain bound by a Pareto frontier between compute…

  19. arXiv cs.LG TIER_1 English(EN) · Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su ·

    CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

    arXiv:2605.24331v1 Announce Type: new Abstract: Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining wha…

  20. arXiv cs.AI TIER_1 English(EN) · Zhuchen Cao, Sven Apel, Adish Singla, Vera Demberg ·

    Pragmatic Reasoning improves LLM Code Generation

    arXiv:2502.15835v5 Announce Type: replace-cross Abstract: Pragmatic reasoning helps interlocutors infer intended meaning from ambiguous or underspecified messages by considering shared context and counterfactual alternatives. Similar challenges arise in natural language-to-code g…

  21. arXiv cs.AI TIER_1 English(EN) · Akira Okutomi ·

    False Fixed Points: Kantian Feedback, Stable Miscalibration, and Representational Compression in LLMs

    arXiv:2510.14925v4 Announce Type: replace Abstract: High-confidence errors in large language models are often treated as fragile failures. We study an alternative: some errors may be false fixed points, locally stable, internally coherent, and confidently wrong. This separates ro…

  22. arXiv cs.AI TIER_1 English(EN) · Parth Darshan, Abhishek Divekar ·

    When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

    arXiv:2605.26046v1 Announce Type: cross Abstract: Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produ…

  23. arXiv cs.AI TIER_1 English(EN) · Abhishek Divekar ·

    When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

    Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vecto…

  24. arXiv cs.AI TIER_1 English(EN) · Rui Zhang ·

    TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

    This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamic…

  25. arXiv cs.LG TIER_1 English(EN) · Jun Zhou ·

    PowLU: An Activation Function for Stable Pre-Training of LLMs

    In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong non…

  26. arXiv cs.AI TIER_1 English(EN) · Mengjie Zhang ·

    AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

    Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due …

  27. arXiv cs.AI TIER_1 English(EN) · Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Yi Li, Yan Sun, Boyu Wang, Pingzhao Hu ·

    Scaling-Aware Adapter for Structure-Grounded LLM Reasoning

    arXiv:2602.02780v3 Announce Type: replace Abstract: Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query conn…

  28. arXiv cs.AI TIER_1 English(EN) · Chuyifei Zhang, Hongyu Cui, Xiaowen Huang, Jitao Sang ·

    Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

    arXiv:2605.23170v1 Announce Type: cross Abstract: Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-cont…

  29. arXiv cs.AI TIER_1 English(EN) · Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea ·

    PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

    arXiv:2605.23168v1 Announce Type: cross Abstract: When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed atta…

  30. arXiv cs.AI TIER_1 English(EN) · Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao ·

    BarrierSteer: LLM Safety via Learning Barrier Steering

    arXiv:2602.20102v2 Announce Type: replace-cross Abstract: Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in h…

  31. arXiv cs.LG TIER_1 English(EN) · Wei Lin, Yining Jiang, Qingyu Song, Qiao Xiang, Hong Xu ·

    AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning

    arXiv:2601.17261v4 Announce Type: replace Abstract: Zeroth-Order (ZO) optimization has emerged as a promising solution for fine-tuning LLMs under strict memory constraints, as it avoids the prohibitive memory cost of storing activations for backpropagation. However, existing ZO m…

  32. arXiv cs.AI TIER_1 English(EN) · Sixing Chen, Ji-An Li, Saner Cakir, Sinan Akcali, Kayla Lee, Marcelo G. Mattar ·

    Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

    arXiv:2605.06840v5 Announce Type: replace Abstract: Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine plan…

  33. arXiv cs.LG TIER_1 English(EN) · Mohammad R. Rezaei, Rahul G. Krishnan ·

    From Residuals to Reasons: LLM-Guided Mechanism Inference from Tabular Data

    arXiv:2605.22897v1 Announce Type: new Abstract: A persistent challenge in machine learning for scientific applications is jointly achieving prediction and understanding. Statistical models excel on structured data but operate as black boxes, while existing interpretability method…

  34. arXiv cs.AI TIER_1 English(EN) · Ziyue Liu, Zhengyang Wang, Ruijie Zhang, Avinash Maurya, Hui Zhou, Paul Hovland, Sheng Di, Franck Cappello, Bogdan Nicolae, Zheng Zhang ·

    ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

    arXiv:2605.11215v2 Announce Type: replace-cross Abstract: Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism sc…

  35. arXiv cs.AI TIER_1 English(EN) · Yiwen Duan, Jing Ye, Xinpei Zhao ·

    ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation

    arXiv:2602.05472v2 Announce Type: replace Abstract: The quest for expert-level reasoning in Large Language Models (LLMs) has been hampered by a persistent \textit{reward bottleneck}: traditional reinforcement learning (RL) relies on scalar rewards that are \textbf{costly} to scal…

  36. arXiv cs.LG TIER_1 English(EN) · Yu Li, Rui Miao, Tian Lan, Zhengling Qi ·

    OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

    arXiv:2605.21851v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal re…

  37. arXiv cs.AI TIER_1 English(EN) · Akshay Manglik (Emily), Apaar Shanker (Emily), Kaustubh Deshpande (Emily), Jason Qin (Emily), Yash Maurya (Emily), Veronica Chatrath (Emily), Vijay S. Kalmath (Emily), Levi Lentz (Emily), Yuan (Emily), Xue ·

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    arXiv:2605.21347v2 Announce Type: new Abstract: Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does…

  38. arXiv cs.AI TIER_1 English(EN) · Can Hankendi, Rana Shahout, Minlan Yu, Ayse K. Coskun ·

    PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

    arXiv:2605.21427v1 Announce Type: new Abstract: Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and…

  39. arXiv cs.AI TIER_1 English(EN) · Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye ·

    Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

    arXiv:2605.20194v1 Announce Type: cross Abstract: Large language models (LLMs) have been increasingly used to analyze text. However, they are often plagued with contextual reasoning limitations when analyzing long documents. When long documents are processed sequentially, early o…

  40. arXiv cs.AI TIER_1 English(EN) · Reese Levine, Rithik Sharma, Nikhil Jain, Abhijit Ramesh, Zheyuan Chen, Neha Abbas, James Contini, Tyler Sorensen ·

    Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

    arXiv:2605.20706v1 Announce Type: cross Abstract: Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To re…

  41. arXiv cs.AI TIER_1 English(EN) · Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu ·

    Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

    arXiv:2605.21312v1 Announce Type: cross Abstract: Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simu…

  42. arXiv cs.AI TIER_1 English(EN) · Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye ·

    Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

    arXiv:2505.19075v3 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficien…

  43. arXiv cs.AI TIER_1 English(EN) · Qizheng Li, Yifei Zhang, Xiao Yang, Xu Yang, Zhuo Wang, Weiqing Liu, Jiang Bian ·

    FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

    arXiv:2603.01712v2 Announce Type: replace Abstract: Fine-tuning large language models for vertical domains remains labor-intensive, requiring practitioners to curate data, configure training, and iteratively diagnose model behavior. Despite growing interest in autonomous machine …

  44. arXiv cs.AI TIER_1 English(EN) · Xian Wu, Kaijie Zhu, Ying Zhang, Lun Wang, Wenbo Guo ·

    rePIRL: Learn PRM with Inverse RL for LLM Reasoning

    arXiv:2602.07832v2 Announce Type: replace-cross Abstract: Process rewards have been widely used in deep reinforcement learning to improve training efficiency, reduce variance, and prevent reward hacking. In LLM reasoning, existing works also explore various solutions for learning…

  45. arXiv cs.AI TIER_1 English(EN) · Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang ·

    Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

    arXiv:2605.17164v2 Announce Type: replace-cross Abstract: Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate an…

  46. arXiv cs.AI TIER_1 English(EN) · Zikai Alex Wen ·

    Toward User Comprehension Supports for LLM Agent Skill Specifications

    arXiv:2605.19362v2 Announce Type: replace-cross Abstract: Users often interpret and select agent skills through their SKILL markdown specifications. To protect users, existing audits mainly focus on malicious or unsafe skills. We study the complementary question of whether specif…

  47. arXiv cs.CL TIER_1 English(EN) · Zhenwei Tang, Zhaoyan Liu, Rasa Hosseinzadeh, Tongzi Wu, Keyvan Golestan, Jesse C. Cresswell ·

    RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator

    arXiv:2605.21748v1 Announce Type: new Abstract: As interactive LLM-based applications are created and refined, model developers need to evaluate the quality of generated text along many possible axes. For simpler systems, human evaluation may be practical, but in complicated syst…

  48. arXiv cs.CL TIER_1 English(EN) · Xiaoyuan Li, Yubo Ma, Chengpeng Li, Fengbin Zhu, Yiyao Yu, Keqin Bao, Wenjie Wang, Fuli Feng, Dayiheng Liu ·

    Unified Data Selection for LLM Reasoning

    arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably d…

  49. arXiv cs.CL TIER_1 English(EN) · Arip Asadulaev, Daniil Ognev, Karim Salta, Martin Takac ·

    Value-Gradient Hypothesis of RL for LLMs

    arXiv:2605.21654v1 Announce Type: cross Abstract: Reinforcement learning substantially improves pretrained language models, but it remains understudied why critic-free methods such as PPO and GRPO work as well as they do, and when they should provide the largest gains. We develop…

  50. arXiv cs.CL TIER_1 English(EN) · Xing Zhang, Yanwei Cui, Guanghui Wang, Ziyuan Li, Wei Qiu, Bing Zhu, Peiyang He ·

    Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

    arXiv:2605.22148v1 Announce Type: cross Abstract: Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while h…

  51. arXiv cs.CL TIER_1 English(EN) · Fengfei Yu, Ruijia Niu, Dongxia Wu, Yian Ma, Rose Yu ·

    Calibrating LLMs with Semantic-level Reward

    arXiv:2605.15588v2 Announce Type: replace Abstract: As large language models (LLMs) are deployed in consequential settings such as medical question answering and legal reasoning, the ability to estimate when their outputs are likely to be correct is essential for safe and reliabl…

  52. arXiv cs.CL TIER_1 English(EN) · Alexandre Cristov\~ao Maiorano ·

    LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications

    arXiv:2603.27355v2 Announce Type: replace-cross Abstract: We present a readiness harness for LLM and RAG applications that turns evaluation into a deployment decision workflow. The system combines automated benchmarks, OpenTelemetry observability, and CI quality gates under a min…

  53. arXiv cs.LG TIER_1 English(EN) · Andy Han, Kristina Fujimoto, Avidan Shah, Kiet Nguyen, Kai Xu, Chen Yueh-Han, Ilia Sucholutsky, Rico Angell ·

    On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation

    arXiv:2605.21834v1 Announce Type: new Abstract: Aligned models can misbehave in several ways: they are often sycophantic, fall victim to jailbreaks, or fail to include appropriate safety warnings. Consistency training is a promising new alignment paradigm to mitigate such failure…

  54. arXiv cs.LG TIER_1 English(EN) · Yifan Lan, Yuanpu Cao, Hanyu Wang, Lu Lin, Jinghui Chen ·

    The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

    arXiv:2605.21856v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive reasoning abilities across a wide range of tasks, but data contamination undermines the objective evaluation of these capabilities. This problem is further exacerbated by mal…

  55. arXiv cs.LG TIER_1 English(EN) · Jialin Chen, Aosong Feng, Harshit Verma, Siyi Gu, Haiwen Wang, Ali Maatouk, Yixuan He, Yifeng Gao, Leandros Tassiulas, Rex Ying ·

    Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

    arXiv:2605.21975v1 Announce Type: new Abstract: Financial markets are characterized by extreme non-stationarity, low signal-to-noise ratios, and strong dependence on external information such as news, company fundamentals, and macroeconomic signals. Yet, existing approaches eithe…

  56. arXiv cs.LG TIER_1 English(EN) · Shuo Yang, Jinda Lu, Kexin Huang, Chiyu Ma, Shaohang Wei, Yuyang Liu, Guoyin Wang, Jingren Zhou, Li Yuan ·

    One-Way Policy Optimization for Self-Evolving LLMs

    arXiv:2605.22156v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a promising paradigm for scaling reasoning capabilities of Large Language Models (LLMs). However, the sparsity of binary verifier rewards often leads to low efficiency…

  57. arXiv cs.LG TIER_1 English(EN) · Manuel Noah Riesen, Peter Alfred von Niederh\"ausern ·

    Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs

    arXiv:2605.22195v1 Announce Type: new Abstract: Graph of Thoughts (GoT), a generalized form of recent prompting paradigms for large language models (LLMs), has been shown to be useful for elaborate problem solving. By executing a graph of operations, thoughts of the LLM are struc…

  58. arXiv cs.LG TIER_1 English(EN) · Hongbin Zhang, Chaozheng Wang, Kehai Chen, Youcheng Pan, Yang Xiang, Jinpeng Wang, Min Zhang ·

    Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

    arXiv:2605.22263v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serves as its own teacher: conditioned on privileged information such as a reference trace or hint, the same policy provides dense token…

  59. arXiv cs.LG TIER_1 English(EN) · Di He, Songjun Tu, Keyu Wang, Lu Yin, Shiwei Liu ·

    One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

    arXiv:2605.22297v1 Announce Type: new Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting …

  60. arXiv cs.LG TIER_1 English(EN) · Athanasios Glentis, Jiaxiang Li, Andi Han, Mingyi Hong ·

    Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

    arXiv:2506.16659v3 Announce Type: replace Abstract: Training large language models (LLMs) relies on adaptive optimizers such as Adam, which introduce extra operations and require significantly more memory to maintain first- and second-order moments than SGD. While recent works su…

  61. arXiv cs.LG TIER_1 English(EN) · Tom Segal, Asaf Shabtai, Yuval Elovici ·

    Provably Protecting Fine-Tuned LLMs from Training Data Extraction while Preserving Utility

    arXiv:2602.00688v2 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) on sensitive datasets raises privacy concerns, as training data extraction (TDE) attacks can expose highly confidential information. Existing defenses against such attacks either lack for…

  62. arXiv cs.LG TIER_1 English(EN) · Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal ·

    On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

    arXiv:2602.12506v3 Announce Type: replace Abstract: Reinforcement learning (RL) finetuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision-language models (VLMs). While RL-tuned VLMs improve on…

  63. arXiv cs.LG TIER_1 English(EN) · Huilin Zhou, Jian Zhao, Yilu Zhong, Zhen Liang, Xiuyuan Chen, Yuchen Yuan, Tianle Zhang, Chi Zhang, Lan Zhang, Xuelong Li ·

    Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization

    arXiv:2605.10067v3 Announce Type: replace Abstract: Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing approaches often rely on static heuristics or stochastic search, rendering them …

  64. arXiv cs.CL TIER_1 English(EN) · Jitao Sang ·

    Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

    Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-context benchmarks and find none jointly controls task…

  65. arXiv cs.CL TIER_1 English(EN) · Dayiheng Liu ·

    Unified Data Selection for LLM Reasoning

    Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably distinguish high- from low-quality reasoning samp…

  66. Hugging Face Daily Papers TIER_1 English(EN) ·

    Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

    Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while human-curated ones deliver $+16.2$pp: the bottlenec…

  67. arXiv cs.CL TIER_1 English(EN) · Peiyang He ·

    Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

    Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+0.0$pp over no-skill baselines while human-curated ones deliver $+16.2$pp: the bottlenec…

  68. Hugging Face Daily Papers TIER_1 English(EN) ·

    The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

    A black-box detection method called Zero-CoT Probe is introduced to identify data contamination in large language models by truncating reasoning processes and comparing performance on original and perturbed datasets.

  69. arXiv cs.CL TIER_1 English(EN) · Yu Meng ·

    You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

    Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight t…

  70. arXiv cs.AI TIER_1 English(EN) · Ayse K. Coskun ·

    PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

    Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a …

  71. Hugging Face Daily Papers TIER_1 English(EN) ·

    PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

    Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior systems optimize throughput and latency by batching, scheduling, and parallelism, they largely treat GPU power as a …

  72. Hugging Face Daily Papers TIER_1 English(EN) ·

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individua…

  73. arXiv cs.AI TIER_1 English(EN) · Xue ·

    Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

    Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individua…

  74. arXiv cs.AI TIER_1 English(EN) · Hong Xu ·

    Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

    Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing de…

  75. arXiv cs.AI TIER_1 English(EN) · Tyler Sorensen ·

    Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

    Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the W…

  76. arXiv cs.CL TIER_1 English(EN) · Yuzhang Shang ·

    TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

    Diffusion Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive (AR) models, offering better hardware utilization and bidirectional context through parallel block-level decoding. However, as dLLMs continue to scale up with mixture-of-experts (M…

  77. arXiv cs.CL TIER_1 English(EN) · Yinghuan Shi ·

    Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

    Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking too…

  78. arXiv cs.AI TIER_1 English(EN) · Egor Shvetsov ·

    Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

    LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are incre…

  79. Hugging Face Daily Papers TIER_1 English(EN) ·

    Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

    LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are incre…

  80. arXiv cs.CL TIER_1 English(EN) · Xuanjing Huang ·

    LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

    Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require conclusions to follow strictly from stated premises. Many existing logical-reasoning benchmarks are generated by templating natural-language items from s…

  81. arXiv cs.CL TIER_1 English(EN) · Jieping Ye ·

    Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

    Large language models (LLMs) have achieved remarkable success in complex reasoning tasks via long chain-of-thought (CoT), yet their immense computational overhead hinders real-world deployment. LLM reasoning distillation addresses this by transferring reasoning capabilities from …

  82. arXiv cs.CL TIER_1 English(EN) · Jitao Sang ·

    Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

    Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance …

  83. arXiv cs.CL TIER_1 English(EN) · Hua Wei ·

    Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

    Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing me…

  84. arXiv cs.AI TIER_1 English(EN) · Pascal Van Hentenryck ·

    Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

    Optimization models developed by operations research (OR) experts are often deployed as decision-support systems in industrial settings. However, real-world environments are dynamic, with evolving business rules, previously overlooked constraints, and unforeseen perturbations. In…

  85. arXiv cs.AI TIER_1 English(EN) · Shaowu Pan ·

    SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

    Large Language Models (LLMs) are increasingly deployed as scientific AI as- sistants, and a growing body of benchmarks evaluates their capabilities across knowledge retrieval, reasoning, code generation, and tool use. These evaluations, however, typically assume the scientific pr…

  86. arXiv cs.CL TIER_1 English(EN) · Song Guo ·

    KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

    Supporting long-context LLMs is challenging due to the substantial memory demands of the key-value (KV) cache. Existing offloading systems store the full cache in host memory and selectively fetch critical entries during decoding, but this strategy quickly hits a ceiling: sparsit…

  87. arXiv cs.CL TIER_1 English(EN) · Maosong Sun ·

    AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

    Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-performance computing. To fully exploit hardware potential, developers often resort to explicit vectorization using intrinsics, as compiler-based auto-vectorization frequently yields…

  88. arXiv cs.MA (Multiagent) TIER_1 English(EN) · James Evans ·

    Multi-LLM Systems Exhibit Robust Semantic Collapse

    Whether machines can originate novel content has been debated for nearly two centuries, from Lovelace's assertion that no engine can "originate anything" to Turing's question of whether a machine can amplify ideas brought in from outside. Multi-large language model (LLM) systems,…

  89. arXiv cs.LG TIER_1 English(EN) · Wes Armour ·

    Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

    Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remov…

  90. Hugging Face Daily Papers TIER_1 English(EN) ·

    Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation

    Rule2DRC introduces a large-scale benchmark for DRC script synthesis with 1,000 rule-to-script tasks and 13,921 evaluation layouts, along with SplitTester which improves program selection through execution-based feedback.

  91. arXiv cs.CV TIER_1 English(EN) · Zehao Wang, Yihan Zeng, Zidong Gong, Yuanfan Guo, Feng Zhu, Hongzhi Zhang, Wei Zhang, Wangmeng Zuo ·

    AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution

    arXiv:2605.25571v1 Announce Type: new Abstract: Post-training via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is crucial for enhancing reasoning in Multimodal Large Language Models (MLLMs), yet existing paradigms often reach a performance bottleneck due to the li…

  92. arXiv stat.ML TIER_1 English(EN) · Junghyun Lee, Sanghwa Kim, Yassir Jedra, Alexandre Prouti\`ere, Se-Young Yun ·

    Instance-Optimal Estimation with Multiple LLM Judges on a Budget

    arXiv:2605.23362v1 Announce Type: cross Abstract: Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can va…

  93. arXiv stat.ML TIER_1 English(EN) · Weijie Su ·

    CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

    Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining what constitutes an optimal weighting remains poorl…

  94. arXiv stat.ML TIER_1 English(EN) · Se-Young Yun ·

    Instance-Optimal Estimation with Multiple LLM Judges on a Budget

    Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can vary substantially. This raises a basic allocation q…

  95. arXiv stat.ML TIER_1 English(EN) · Hamed Khosravi, Xiaoming Huo ·

    Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

    arXiv:2605.20270v1 Announce Type: cross Abstract: A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $\alpha$. The operator needs a safety …

  96. arXiv stat.ML TIER_1 English(EN) · J. G. Dai, Tianze Deng, Yueying Li, Tianyi Peng ·

    Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

    arXiv:2504.07347v3 Announce Type: replace Abstract: As demand for Large Language Models (LLMs) and AI agents grows rapidly, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little has been explored …

  97. arXiv stat.ML TIER_1 English(EN) · Xiaoming Huo ·

    Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

    A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round…

  98. arXiv stat.ML TIER_1 English(EN) · Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang ·

    Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

    arXiv:2504.11320v3 Announce Type: replace-cross Abstract: Large language models now serve millions of users daily, with providers incurring costs exceeding $700,000 per day. Each request requires token-by-token inference, making GPU scheduling central to latency, capacity, and co…

  99. Databricks Blog TIER_1 English(EN) ·

    Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks

    Why Prompt Caching MattersLarge language model (LLM) inference often involves repeated...

  100. Together AI blog TIER_1 English(EN) ·

    AI for Systems: Using LLMs to Optimize Database Query Execution

    New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.

  101. Together AI blog TIER_1 English(EN) ·

    Introducing the Together AI Batch API: Process Thousands of LLM Requests at 50% Lower Cost

  102. Together AI blog TIER_1 English(EN) ·

    Mixture-of-Agents Alignment: Harnessing the Collective Intelligence of Open-Source LLMs to Improve Post-Training

  103. Hacker News — AI stories ≥50 points TIER_1 English(EN) · AMavorParker ·

    PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

  104. Medium — MLOps tag TIER_1 English(EN) · The_Turingetic_Guy ·

    Large-Scale Distributed LLM Inference — Part 1

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rtxtdfs/large-scale-distributed-llm-inference-part-1-54343375c2c4?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/798/1*H-GnzHY45Yo7AnuLCpspfw.png" width="798" /></a></p…

  105. Medium — fine-tuning tag TIER_1 English(EN) · Boring Developer ·

    Fine-Tuning LLM: Building Personality of AI

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@parthbissa5/fine-tuning-llm-building-personality-of-ai-fa74b8a40c0d?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/600/1*E-guVNJTOIosxYAi2SPstw.jpeg" width="600" …

  106. Medium — fine-tuning tag TIER_1 English(EN) · QuarkAndCode ·

    Fine-Tuning and Alignment: How Domain Adaptation Builds Specialized LLMs

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@QuarkAndCode/fine-tuning-and-alignment-how-domain-adaptation-builds-specialized-llms-7c6d93f66937?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1024/1*D2kcjRNI5S…

  107. Medium — fine-tuning tag TIER_1 English(EN) · QuarkAndCode ·

    Why Pretrained LLMs Need Fine-Tuning for Better AI Performance

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@QuarkAndCode/why-pretrained-llms-need-fine-tuning-for-better-ai-performance-6541293f9fef?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/max/1024/1*y3FRj0ALAXfwrMOzXPZ…

  108. Medium — MLOps tag TIER_1 English(EN) · Charan Panthangi ·

    Inference Optimization — How to Make LLMs Faster and Cheaper in Production

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@charan.panthangi/inference-optimization-how-to-make-llms-faster-and-cheaper-in-production-2778cd00d921?source=rss------mlops-5"><img src="https://cdn-images-1.medium.com/max/1200/1*tyCL0_ikRhY…

  109. dev.to — LLM tag TIER_1 English(EN) · pixelbank dev ·

    LLM 的应用 — 深度解析 + 问题:信息增益

    <p><em>A daily deep dive into llm topics, coding problems, and platform features from <a href="https://pixelbank.dev" rel="noopener noreferrer">PixelBank</a>.</em></p> <h2> Topic Deep Dive: Applications of LLMs </h2> <p><em>From the Introduction to LLMs chapter</em></p> <h2> Intr…

  110. dev.to — LLM tag TIER_1 English(EN) · David Moores ·

    Benchmarking LLM Structured Outputs

    <blockquote> <p>Cross-posted from <a href="https://carrick.tools/blog/benchmarking-llm-structured-outputs/" rel="noopener noreferrer">carrick.tools</a>.</p> </blockquote> <p>When you read the API documentation for OpenAI, Anthropic, or Google Gemini, the feature called "structure…

  111. dev.to — LLM tag TIER_1 English(EN) · Mustafa ERBAY ·

    LLM Inference Caching: How to Balance Cost and Latency?

    <h2> Introduction to LLM Inference Caching: Why It Matters? </h2> <p>When working with Large Language Models (LLMs), especially as you start using them in production environments, one of the first major challenges you'll face is the delicate balance between cost and latency. LLMs…

  112. dev.to — LLM tag TIER_1 English(EN) · Nishkarsh Sahu ·

    Building a Rails-Native AI Abstraction Layer for Local and Hosted LLMs

    <p>Recently I’ve been experimenting with integrating local AI runtimes into Rails applications using tools like Ollama and LM Studio.</p> <p>At first, the integration looked straightforward:<br /> make an HTTP request, stream the response, and return the generated text.</p> <p>Bu…

  113. dev.to — LLM tag TIER_1 English(EN) · Kotcherla Murali Krishna ·

    Modular LLM Inference Engine from Scratch

    <p>Why vLLM, TensorRT-LLM, and llama.cpp each solve only part of the problem — and how I built inferx to fill the gap. Runs on any laptop, no GPU needed.</p> <p>I spent the last few months building inferx — an open-source LLM inference optimization library that runs on any machin…

  114. Mastodon — mastodon.social TIER_1 Русский(RU) · [email protected] ·

    LLM Scaling: From a Single Chip to the Data Center. Chapter 2. Sharding This is a continuation of the series of articles on scaling LLM training and inference. Previous chapter

    [Перевод] Масштабирование LLM: от одного чипа до ЦОДа. Глава 2. Шардинг Это продолжение цикла статей о масштабировании тренировки и инференса LLM. Предыдущая глава находится по этой ссылке . Итак, с основами разобрались, давайте теперь разбираться с тем, как распихать матрицы по …