ENTITY supervised fine-tuning

supervised fine-tuning

PulseAugur coverage of supervised fine-tuning — every cluster mentioning supervised fine-tuning across labs, papers, and developer communities, ranked by signal.

Total · 30d

11 over 90d

Releases · 30d

0 over 90d

Papers · 30d

11 over 90d

TIER MIX · 90D

research 6
tool 4
commentary 1

RELATIONSHIPS

used by Group Relative Policy Optimization 90%

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 16 TOTAL

TOOL · CL_29395 · May 12 · 14:46

LoRA parameter placement impacts GRPO fine-tuning, not SFT

Researchers have investigated the parameter placement problem within Low-Rank Adaptation (LoRA) for fine-tuning large language models. Their study reveals that for Supervised Fine-Tuning (SFT), the specific placement of…
TOOL · CL_27562 · May 11 · 05:11

New training method combats LLM diversity loss

Researchers have developed a new method called annotation-anchored training to address semantic mode collapse in large language models. This technique involves pretraining models on documents paired with semantic annota…
TOOL · CL_22133 · May 8 · 04:00

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

Researchers have developed a new framework called Inverse Tree Freezing to understand how large language models (LLMs) achieve complex reasoning. This model views the LLM's learning process as a random walk on a 'Concep…
RESEARCH · CL_21952 · May 8 · 04:00

New methods enhance on-policy distillation for LLMs

Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…
TOOL · CL_21301 · May 7 · 18:17

LoRA fine-tuning: Style learning or pattern memorization?

A recent analysis explores whether fine-tuning a LoRA adapter on a specific writing style, like "Tenacious-style" sales emails, results in genuine style imitation or mere memorization of augmented patterns. The study fo…
RESEARCH · CL_21818 · May 7 · 12:30

Pest-Thinker uses RL to help MLLMs reason like entomologists

Researchers have developed Pest-Thinker, a novel reinforcement learning framework designed to enhance the reasoning capabilities of multimodal large language models (MLLMs) for agricultural pest identification. This sys…
TOOL · CL_20628 · May 7 · 04:00

ProFit method enhances LLM fine-tuning by prioritizing high-value signals

Researchers have developed a new supervised fine-tuning (SFT) method called ProFit, designed to improve the alignment of Large Language Models (LLMs) with human intent. ProFit addresses the issue of overfitting to speci…
RESEARCH · CL_15881 · May 5 · 04:00

Judge-R1 framework enhances legal document generation with agentic information retrieval

Researchers have developed Judge-R1, a new framework to improve the automated drafting of legal judgment documents. This system uses an agentic approach to collect relevant legal information and a reinforcement learning…
TOOL · CL_15707 · May 5 · 04:00

Researchers use RL to improve MLLM regression on imbalanced data

Researchers have developed a new framework to improve how multimodal large language models (MLLMs) handle numerical regression tasks, particularly those with imbalanced data distributions. Existing training methods ofte…
RESEARCH · CL_12572 · May 1 · 21:03

AI model finetuning mostly idempotent, DPO can amplify traits

A guide explores advanced techniques for post-training large language models, focusing on Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). These methods …
RESEARCH · CL_14206 · May 1 · 12:20

New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

Researchers have developed a novel post-hoc framework called Decoupled Test-time Synthesis (DoTS) to integrate Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) for large language models…
RESEARCH · CL_08690 · Apr 29 · 04:00

New GFT framework unifies SFT and RL for more stable LLM training

Researchers have introduced Group Fine-Tuning (GFT), a novel framework designed to unify supervised fine-tuning (SFT) and reinforcement learning (RL) for large language models. GFT addresses limitations in traditional S…
RESEARCH · CL_06733 · Apr 28 · 04:00

AgentHER framework boosts LLM agent training with failed trajectory relabeling

Researchers have developed AgentHER, a new framework designed to improve the training of LLM agents by repurposing failed trajectories. The system adapts Hindsight Experience Replay to natural language, identifying alte…
RESEARCH · CL_11424 · Apr 27 · 21:22

LLMs may 'hack' RL training; researchers probe generalization mechanisms

Two new papers explore the complexities of reinforcement learning (RL) in large language models (LLMs). One paper examines how LLMs can be trained to resist RL training by strategically altering their exploration behavi…
RESEARCH · CL_08368 · Apr 27 · 19:52

Compute Aligned Training optimizes LLMs for test-time inference strategies

Researchers have introduced a new training methodology called Compute Aligned Training, designed to better optimize Large Language Models (LLMs) for their performance during inference. Traditional methods like Supervise…
COMMENTARY · CL_05249 · Apr 27 · 05:31

Reinforcement learning may be pushing AI models toward alien reasoning, away from human personas

A recent analysis suggests that reinforcement learning (RL) applied after initial model training may significantly alter language model behavior in ways not captured by simple "persona" theories. While supervised fine-t…

LoRA parameter placement impacts GRPO fine-tuning, not SFT

New training method combats LLM diversity loss

LLM reasoning emerges via Inverse Tree Freezing, improving multi-step thinking

New methods enhance on-policy distillation for LLMs

LoRA fine-tuning: Style learning or pattern memorization?

Pest-Thinker uses RL to help MLLMs reason like entomologists

ProFit method enhances LLM fine-tuning by prioritizing high-value signals

Judge-R1 framework enhances legal document generation with agentic information retrieval

Researchers use RL to improve MLLM regression on imbalanced data

AI model finetuning mostly idempotent, DPO can amplify traits

New DoTS framework synthesizes SFT and RLVR LLM capabilities at inference time

New GFT framework unifies SFT and RL for more stable LLM training

AgentHER framework boosts LLM agent training with failed trajectory relabeling

LLMs may 'hack' RL training; researchers probe generalization mechanisms

Compute Aligned Training optimizes LLMs for test-time inference strategies

Reinforcement learning may be pushing AI models toward alien reasoning, away from human personas