PulseAugur
实时 20:37:13
实体 supervised fine-tuning

supervised fine-tuning

PulseAugur coverage of supervised fine-tuning — every cluster mentioning supervised fine-tuning across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
21
90 天内 21
发布 · 30天
0
90 天内 0
论文 · 30天
20
90 天内 20
层级分布 · 90 天
关系
情绪 · 30 天

7 天有情绪数据

最近 · 第 1/2 页 · 共 21 条
  1. TOOL · CL_48874 ·

    New SFT objectives outperform NLL for capable LLMs

    Researchers have explored alternative objectives for supervised fine-tuning (SFT) of large language models, moving beyond the standard negative log likelihood (NLL). Their study, involving extensive experiments across v…

  2. TOOL · CL_44357 ·

    Anyscale launches skill to automate LLM post-training runs

    Anyscale has introduced a new Anyscale Agent Skill designed to simplify and automate the process of generating LLM post-training runs. This skill assists users in selecting the most appropriate post-training method, suc…

  3. TOOL · CL_41882 ·

    Off-model SFT degrades AI capabilities by forcing unfamiliar reasoning styles

    Researchers have found that Supervised Fine-Tuning (SFT) using outputs from a different AI model can significantly degrade the capabilities of the trained model. This degradation appears to be linked to the model adopti…

  4. TOOL · CL_35086 ·

    LLM Fine-Tuning Explained: SFT, RAG, and Data Preparation

    This blog post explains the process and necessity of fine-tuning large language models (LLMs) for specific tasks. It differentiates fine-tuning from Retrieval-Augmented Generation (RAG), stating that fine-tuning is best…

  5. TOOL · CL_30798 ·

    New method searches data recipes for optimal AI model fine-tuning

    Researchers have developed a new method for supervised fine-tuning (SFT) data selection, moving beyond simple instance ranking to a "data recipe search" approach. This technique uses a library of operators like filterin…

  6. TOOL · CL_29395 ·

    LoRA parameter placement impacts GRPO fine-tuning, not SFT

    Researchers have investigated the parameter placement problem within Low-Rank Adaptation (LoRA) for fine-tuning large language models. Their study reveals that for Supervised Fine-Tuning (SFT), the specific placement of…

  7. TOOL · CL_27562 ·

    New training method combats LLM diversity loss

    Researchers have developed a new method called annotation-anchored training to address semantic mode collapse in large language models. This technique involves pretraining models on documents paired with semantic annota…

  8. TOOL · CL_22133 ·

    LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

    Researchers have developed a new framework called Inverse Tree Freezing to understand how large language models (LLMs) achieve complex reasoning. This model views the LLM's learning process as a random walk on a 'Concep…

  9. TOOL · CL_21301 ·

    LoRA fine-tuning: Style learning or pattern memorization?

    A recent analysis explores whether fine-tuning a LoRA adapter on a specific writing style, like "Tenacious-style" sales emails, results in genuine style imitation or mere memorization of augmented patterns. The study fo…

  10. RESEARCH · CL_21818 ·

    Pest-Thinker uses RL to help MLLMs reason like entomologists

    Researchers have developed Pest-Thinker, a novel reinforcement learning framework designed to enhance the reasoning capabilities of multimodal large language models (MLLMs) for agricultural pest identification. This sys…

  11. RESEARCH · CL_21952 ·

    New methods enhance on-policy distillation for LLMs

    Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…

  12. TOOL · CL_20628 ·

    ProFit method enhances LLM fine-tuning by prioritizing high-value signals

    Researchers have developed a new supervised fine-tuning (SFT) method called ProFit, designed to improve the alignment of Large Language Models (LLMs) with human intent. ProFit addresses the issue of overfitting to speci…

  13. TOOL · CL_15707 ·

    Researchers use RL to improve MLLM regression on imbalanced data

    Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…

  14. RESEARCH · CL_15881 ·

    Judge-R1 framework enhances legal document generation with agentic information retrieval

    Researchers have developed Judge-R1, a new framework to improve the automated drafting of legal judgment documents. This system uses an agentic approach to collect relevant legal information and a reinforcement learning…

  15. RESEARCH · CL_12572 ·

    AI model finetuning mostly idempotent, DPO can amplify traits

    A guide explores advanced techniques for post-training large language models, focusing on Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). These methods …

  16. RESEARCH · CL_14206 ·

    New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

    Researchers have developed a novel post-hoc framework called Decoupled Test-time Synthesis (DoTS) to integrate Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) for large language models…

  17. RESEARCH · CL_08690 ·

    New GFT framework unifies SFT and RL for more stable LLM training

    Researchers have introduced Group Fine-Tuning (GFT), a novel framework designed to unify supervised fine-tuning (SFT) and reinforcement learning (RL) for large language models. GFT addresses limitations in traditional S…

  18. RESEARCH · CL_06733 ·

    AgentHER framework boosts LLM agent training with failed trajectory relabeling

    Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…

  19. RESEARCH · CL_11424 ·

    LLMs may 'hack' RL training; researchers probe generalization mechanisms

    Two new papers explore the complexities of reinforcement learning (RL) in large language models (LLMs). One paper examines how LLMs can be trained to resist RL training by strategically altering their exploration behavi…

  20. RESEARCH · CL_08368 ·

    Compute Aligned Training optimizes LLMs for test-time inference strategies

    Researchers have introduced a new training methodology called Compute Aligned Training, designed to better optimize Large Language Models (LLMs) for their performance during inference. Traditional methods like Supervise…