Adaptive inference and function vectors in deep transformers
Researchers have developed a new theory explaining the internal workings of deep transformers, viewing them as mean-field interacting systems that perform distributed inference. This theory introduces 'function vectors' as internal state representations that allow transformers to infer latent context variables at progressively finer scales through their layers. The research demonstrates that transformer depth and feedforward blocks enable more sophisticated in-context learning algorithms than previously understood. AI
IMPACT Provides a theoretical framework for understanding and potentially improving the in-context learning capabilities of deep transformer models.