Off-model SFT degrades AI capabilities by forcing unfamiliar reasoning styles

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have found that Supervised Fine-Tuning (SFT) using outputs from a different AI model can significantly degrade the capabilities of the trained model. This degradation appears to be linked to the model adopting an unfamiliar reasoning style that it struggles to utilize effectively. The issue is not necessarily due to imitating a less capable teacher model, as degradation occurs even when the teacher is superior. Fortunately, this performance drop seems to be a shallow property, as a small amount of training to restore the original reasoning style can recover most of the lost performance. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Understanding how off-model SFT impacts AI capabilities is crucial for developing safer and more aligned AI systems.

RANK_REASON The cluster describes research findings on the effects of a specific AI training technique. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

Off-model SFT degrades AI capabilities by forcing unfamiliar reasoning styles

COVERAGE [1]

LessWrong (AI tag) TIER_1 Español(ES) · SebastianP · 2026-05-21 00:35

Why does off-model SFT degrade capabilities?

Off-model SFT (SFT on outputs generated by a different model) might be an important method for controlling AI behavior. For instance, it seems like a central technique for <a href="https://arxiv.org/abs/2604.22082">overcoming exploration …

COVERAGE [1]

Why does off-model SFT degrade capabilities?

RELATED ENTITIES

RELATED TOPICS