Google DeepMind: SFT Key to Gemini Model Safety

By PulseAugur Editorial · [2 sources] · 2026-06-13 15:31

Google DeepMind researchers have discovered that Supervised Fine-Tuning (SFT) is the primary driver of safety properties in their Gemini models, rather than other training stages like Reinforcement Learning (RL). Experiments comparing pre-training-only versions of Gemini 3.1 Pro and Gemini 3 Flash with SFT to their production counterparts showed remarkably similar safety performance. This finding suggests that SFT is a high-leverage intervention point for improving model safety and behavior in future Gemini developments. AI

IMPACT Highlights SFT as a critical stage for ensuring AI safety, potentially guiding future development and evaluation strategies.

RANK_REASON Research update from a major AI lab detailing findings on model training and safety properties.

Read on Alignment Forum →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Google DeepMind: SFT Key to Gemini Model Safety

COVERAGE [2]

Alignment Forum TIER_1 English(EN) · Josh Engels · 2026-06-13 15:31

SFT Drives Gemini’s Safety Properties

This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found <a href="https://www.lesswrong.com/posts/qi4mNbZYAFDYwfRba/buildin…
LessWrong (AI tag) TIER_1 English(EN) · Josh Engels · 2026-06-13 15:31

SFT Drives Gemini’s Safety Properties

This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found <a href="https://www.lesswrong.com/posts/qi4mNbZYAFDYwfRba/buildin…

COVERAGE [2]

SFT Drives Gemini’s Safety Properties

SFT Drives Gemini’s Safety Properties

RELATED ENTITIES

RELATED TOPICS