Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs
Researchers have developed cascaded sparse autoencoders (CSAEs) to better interpret the visual representations within multimodal large language models (MLLMs). Unlike previous methods that produced flat feature dictionaries, CSAEs learn hierarchical visual concepts by training a second-level SAE on the decoder weights of a first-level SAE. This approach allows for the creation of "concepts of concepts" without the drawbacks of nested or naively stacked SAEs. Experiments on models like Qwen3-VL, Gemma-3, and LLaVA demonstrate that CSAEs enhance hierarchical concept coherence and enable effective group-level interventions in MLLM outputs. AI
IMPACT This research offers a novel method for interpreting complex visual concepts within multimodal LLMs, potentially leading to more transparent and steerable AI systems.