PulseAugur
EN
LIVE 22:56:51

ML practitioner maps AI model circuits using contrastive SFT

A machine learning practitioner is exploring a novel method for understanding and controlling AI model behavior by mapping causal dependencies between different capabilities. The approach involves using contrastive supervised fine-tuning (SFT) to isolate specific circuits within a 31B parameter model. By training variants that emphasize or de-emphasize certain dimensions and then ablating identified circuits, the practitioner aims to build a causal dependency graph of model capabilities. This graph could then inform optimal training orders for future model development and enhance behavioral control. AI

IMPACT This research could lead to more predictable and controllable AI behavior by mapping internal causal dependencies.

RANK_REASON The item describes a novel research methodology for understanding AI model internals, not a formal publication or a product release. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/MachineLearning →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/MachineLearning TIER_1 English(EN) · /u/Substantial_Diver469 ·

    Contrastive targeted SFT as a mechinterp method - has anyone mapped causal dependency interactions this way? [D]

    <!-- SC_OFF --><div class="md"><p>Hi All, I've been running experiments on targeted SFT for specific capability dimensions on a 31B model. After running small training run to prime the model slightly in the direction I want, then ran a judge across 40 domains scoring six independ…