Activation-Based Active Learning for In-Context Learning: Challenges and Insights
Two new research papers explore the mechanisms behind in-context learning in large language models. One paper investigates whether transformer activations can be used to optimize in-context sample selection, finding that MLP outputs do not correlate with performance and suggesting future directions like Sparse Autoencoders. The other paper proposes that the stacking of self-attention and MLP layers allows transformers to implicitly update MLP weights based on context, potentially explaining in-context learning capabilities without additional training. AI
IMPACT These papers offer theoretical insights into how LLMs learn from prompts, potentially guiding future model development and fine-tuning strategies.