A Geometric Unification of Concept Learning with Concept Cones
Researchers have developed a geometric framework that unifies supervised and unsupervised concept learning in AI models. This approach views both Concept Bottleneck Models (CBMs) and Sparse Autoencoders (SAEs) as learning linear directions that form concept cones. The study proposes metrics to evaluate how well SAEs' discovered concepts align with human-defined concepts from CBMs, identifying optimal parameters for sparsity and expansion to maximize this alignment. AI
IMPACT Provides a unified geometric perspective for AI interpretability, offering new metrics to evaluate unsupervised concept discovery.