supervised fine-tuning
PulseAugur coverage of supervised fine-tuning — every cluster mentioning supervised fine-tuning across labs, papers, and developer communities, ranked by signal.
- used by Direct Preference Optimization: Your Language Model is Secretly a Reward Model 70%
- used by Group Relative Policy Optimization 70%
- instance of Direct Preference Optimization 70%
- other Direct Preference Optimization 60%
- used by ScienceCast 60%
- used by Grpo 60%
- used by alphaXiv 60%
- developed by Group Relative Policy Optimization 50%
19 day(s) with sentiment data
-
New Intent-Aware Training Boosts LLM Safety Classifiers
Researchers have developed a new method for improving the safety classification of large language models by explicitly modeling user intent. They introduced AIMS, a dataset of 1,724 safety prompts with associated intent…
-
New AI architecture quantifies judicial discretion in legal outcome prediction
Researchers have developed a novel Judge-Aware Gated Multi-Task Learning architecture to better predict legal outcomes by distinguishing between factual case evidence and judicial discretion. This approach, evaluated on…
-
New Generalization Spectrum framework evaluates AI learning transfer
Researchers have introduced the Generalization Spectrum, a novel evaluation framework designed to assess how far a learning algorithm's knowledge can transfer beyond its training data. This approach moves beyond traditi…
-
New research explores weight-space geometry of AI reasoning distillation methods
A new research paper analyzes the geometric properties of weight updates across various offline reinforcement learning methods used for distilling reasoning capabilities into smaller AI models. The study trained six dif…
-
New benchmarks push video AI to ground answers in temporal evidence · 4 sources tracked
Two new research papers introduce benchmarks and models for video question answering that focus on temporal reasoning and evidence grounding. The EG-VQA benchmark, with over 11,000 QA pairs and temporal evidence annotat…
-
PointVG-R model enhances visual grounding with geometric reasoning · 3 sources tracked
Researchers have developed PointVG-R, a novel reasoning-guided Multi-modal Large Language Model (MLLM) designed to improve precise pointing localization in images. This model integrates geometric-aware reasoning, Reinfo…
-
VibeThinker 3B model surpasses Opus 4.5 in reasoning with novel SFT+GRPO
A new 3-billion parameter model named VibeThinker has demonstrated superior reasoning capabilities compared to Anthropic's Opus 4.5. This performance was achieved using a novel combination of supervised fine-tuning (SFT…
-
New BALTO framework precisely targets LLM hallucinations at token level
Researchers from Shanghai Jiao Tong University and Tencent have developed BALTO, a novel reinforcement learning framework designed to precisely eliminate hallucinations in large language models (LLMs). The framework ope…
-
LLMs fail to reliably self-report adversarial prefill attacks, study finds
A new study published on arXiv investigates the ability of large language models (LLMs) to self-report when they have been influenced by adversarial prefill attacks. The research found that across ten different open-wei…
-
BoxCtrl framework enables precise 3D geometric image editing
Researchers have introduced BoxCtrl, a novel framework for precise 3D geometric image editing. This method utilizes 3D bounding boxes with distinct RGB colors projected onto 2D images as visual prompts, allowing for acc…
-
Knowledge distillation outperforms SFT in low-data LLM training
A new paper explores knowledge distillation (KD) for post-training large language models (LLMs), finding it outperforms supervised fine-tuning (SFT) in low-data scenarios. The effectiveness of KD diminishes as more data…
-
RLVR outperforms SFT for LLM reasoning, paper shows
A new paper analyzes why reinforcement fine-tuning, specifically Reinforcement Learning with Verifiable Rewards (RLVR), outperforms supervised fine-tuning (SFT) for improving the reasoning capabilities of large language…
-
New AI agents leverage world models and self-repair for enhanced reasoning
Researchers have introduced Qwen-AgentWorld, a novel language world model designed to simulate agent environments across seven domains. This model is trained through a three-stage pipeline including continual pre-traini…
-
Guide to Supervised Fine-Tuning Launched
This article serves as an introductory guide to supervised fine-tuning, marking the beginning of a series focused on this technique. It aims to educate readers on the fundamental concepts and initial steps involved in a…
-
Persistent homology tracks LLM representation changes during fine-tuning
Researchers have employed persistent homology to analyze the internal representation dynamics of large language models during supervised fine-tuning. Their study, which examined four transformer models (1B to 7B paramet…
-
New Agentic Data Tailoring paradigm structures multimodal streams
Researchers have introduced a new paradigm called Agentic Data Tailoring, which uses learnable data processing to structure high-entropy multimodal streams. The DataClaw_0-9B model, trained using supervised fine-tuning …
-
New 'Sparsity Curse' hinders merging of advanced RLVR AI models
A new research paper introduces the "Sparsity Curse" phenomenon, which describes how Reinforcement Learning with Verifiable Reward (RLVR) models, despite their advanced reasoning capabilities, become difficult to merge …
-
New DRIFT method refines LLM training data for improved performance
Researchers have developed DRIFT, a novel method for refining instruction data to improve the performance ceiling of large language models. Unlike existing data curation techniques that focus on subset selection, DRIFT …
-
New research explores RL advancements for LLMs and AI agents · 8 sources tracked
Multiple research papers released on arXiv explore advancements in reinforcement learning (RL) for large language models (LLMs) and other AI agents. One paper introduces RiVER, a framework for training LLMs on score-bas…
-
Study compares LLM adaptation methods for French medical QA
A new study published on arXiv explores the effectiveness of different methods for adapting large language models (LLMs) to specialized domains and languages, using French medical question-answering as a case study. The…