Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

A Geometric Unification of Concept Learning with Concept Cones

Researchers have developed a geometric framework that unifies supervised and unsupervised concept learning in AI models. This approach views both Concept Bottleneck Models (CBMs) and Sparse Autoencoders (SAEs) as learning linear directions that form concept cones. The study proposes metrics to evaluate how well SAEs' discovered concepts align with human-defined concepts from CBMs, identifying optimal parameters for sparsity and expansion to maximize this alignment. AI

IMPACT Provides a unified geometric perspective for AI interpretability, offering new metrics to evaluate unsupervised concept discovery.

Concept Bottleneck Models
Sparse Autoencoders
Gianni Franchi