Markovian Circuit Tracing for Transformer State Dynamic
Researchers have developed a new framework called Markovian Circuit Tracing (MCT) to analyze the internal state dynamics of transformer models. This method uses synthetic Hidden Markov Model (HMM) tasks to test if transformer activations exhibit coarse state-transition structures. The findings indicate that transformers can learn near-Bayes next-token predictors and that residual activations contain partial Bayesian belief information, with state patching significantly improving accuracy. AI
IMPACT Introduces a new benchmark and evaluation framework for transformer interpretability, potentially aiding in understanding model behavior.