Feature Lottery? A Bifurcation Theory of Concept Emergence
Researchers have developed a bifurcation theory to better understand how neural networks develop structured representations during training. This theory introduces a new, label-free metric called the beta/beta_c ratio, which can predict the emergence of concepts in real-time. The research demonstrates that this metric can identify different transition regimes and even explain phenomena like grokking, where learning appears to be delayed. Furthermore, the theory suggests that early training dynamics can predict the final interpretability of features, acting as a practical indicator for training health. AI
IMPACT Provides a new theoretical framework for understanding and predicting concept emergence in neural networks, potentially improving training efficiency and interpretability.