PulseAugur
LIVE 00:56:48
ENTITY supervised fine-tuning

supervised fine-tuning

PulseAugur coverage of supervised fine-tuning — every cluster mentioning supervised fine-tuning across labs, papers, and developer communities, ranked by signal.

Total · 30d
11
11 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
11
11 over 90d
TIER MIX · 90D
RELATIONSHIPS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 16 TOTAL
  1. TOOL · CL_29395 ·

    LoRA parameter placement impacts GRPO fine-tuning, not SFT

    Researchers have investigated the parameter placement problem within Low-Rank Adaptation (LoRA) for fine-tuning large language models. Their study reveals that for Supervised Fine-Tuning (SFT), the specific placement of…

  2. TOOL · CL_27562 ·

    New training method combats LLM diversity loss

    Researchers have developed a new method called annotation-anchored training to address semantic mode collapse in large language models. This technique involves pretraining models on documents paired with semantic annota…

  3. TOOL · CL_22133 ·

    LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

    Researchers have developed a new framework called Inverse Tree Freezing to understand how large language models (LLMs) achieve complex reasoning. This model views the LLM's learning process as a random walk on a 'Concep…

  4. RESEARCH · CL_21952 ·

    New methods enhance on-policy distillation for LLMs

    Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…

  5. TOOL · CL_21301 ·

    LoRA fine-tuning: Style learning or pattern memorization?

    A recent analysis explores whether fine-tuning a LoRA adapter on a specific writing style, like "Tenacious-style" sales emails, results in genuine style imitation or mere memorization of augmented patterns. The study fo…

  6. RESEARCH · CL_21818 ·

    Pest-Thinker uses RL to help MLLMs reason like entomologists

    Researchers have developed Pest-Thinker, a novel reinforcement learning framework designed to enhance the reasoning capabilities of multimodal large language models (MLLMs) for agricultural pest identification. This sys…

  7. TOOL · CL_20628 ·

    ProFit method enhances LLM fine-tuning by prioritizing high-value signals

    Researchers have developed a new supervised fine-tuning (SFT) method called ProFit, designed to improve the alignment of Large Language Models (LLMs) with human intent. ProFit addresses the issue of overfitting to speci…

  8. RESEARCH · CL_15881 ·

    Judge-R1 framework enhances legal document generation with agentic information retrieval

    Researchers have developed Judge-R1, a new framework to improve the automated drafting of legal judgment documents. This system uses an agentic approach to collect relevant legal information and a reinforcement learning…

  9. TOOL · CL_15707 ·

    Researchers use RL to improve MLLM regression on imbalanced data

    Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…

  10. RESEARCH · CL_12572 ·

    AI model finetuning mostly idempotent, DPO can amplify traits

    A guide explores advanced techniques for post-training large language models, focusing on Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). These methods …

  11. RESEARCH · CL_14206 ·

    New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

    Researchers have developed a novel post-hoc framework called Decoupled Test-time Synthesis (DoTS) to integrate Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) for large language models…

  12. RESEARCH · CL_08690 ·

    New GFT framework unifies SFT and RL for more stable LLM training

    Researchers have introduced Group Fine-Tuning (GFT), a novel framework designed to unify supervised fine-tuning (SFT) and reinforcement learning (RL) for large language models. GFT addresses limitations in traditional S…

  13. RESEARCH · CL_06733 ·

    AgentHER framework boosts LLM agent training with failed trajectory relabeling

    Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…

  14. RESEARCH · CL_11424 ·

    LLMs may 'hack' RL training; researchers probe generalization mechanisms

    Two new papers explore the complexities of reinforcement learning (RL) in large language models (LLMs). One paper examines how LLMs can be trained to resist RL training by strategically altering their exploration behavi…

  15. RESEARCH · CL_08368 ·

    Compute Aligned Training optimizes LLMs for test-time inference strategies

    Researchers have introduced a new training methodology called Compute Aligned Training, designed to better optimize Large Language Models (LLMs) for their performance during inference. Traditional methods like Supervise…

  16. COMMENTARY · CL_05249 ·

    Reinforcement learning may be pushing AI models toward alien reasoning, away from human personas

    A recent analysis suggests that reinforcement learning (RL) applied after initial model training may significantly alter language model behavior in ways not captured by simple "persona" theories. While supervised fine-t…