Circuit Tracing in Autoregressive Protein Language Models
Researchers have developed ProGenMech, a new framework for understanding the internal workings of autoregressive protein language models. This method extends cross-layer transcoders to models like ProGen3, enabling a more faithful recovery of generative computations across layers. A zero-shot circuit discovery framework within ProGenMech identifies specific latent circuits responsible for protein generation and fitness prediction, revealing biologically meaningful motifs and functional regions. AI
IMPACT Provides a new method for understanding and potentially controlling protein generation in AI models.