PulseAugur
EN
LIVE 15:19:35

eNTK eigenanalysis surfaces features in trained neural networks

Researchers have demonstrated that analyzing the empirical Neural Tangent Kernel (eNTK) can reveal feature directions within trained neural networks. This method was tested on a 1-layer MLP and a 1-layer Transformer, showing that the top eigenspaces of the eNTK align with ground-truth or interpretable features. For a pretrained language model, Gemma-3-270M, eNTK eigendirections aligned with grammatical features better than PCA on model activations, suggesting eNTK eigenanalysis as a tool for mechanistic interpretability. AI

IMPACT Introduces a novel technique for understanding internal model representations, potentially aiding in interpretability research.

RANK_REASON Academic paper detailing a new method for analyzing neural network features. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

eNTK eigenanalysis surfaces features in trained neural networks

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Jennifer Lin ·

    Feature Identification via the Empirical NTK

    arXiv:2510.00468v4 Announce Type: replace Abstract: We provide evidence that eigenanalysis of the empirical neural tangent kernel (eNTK) can surface feature directions in trained neural networks. Across three increasingly realistic settings -- a 1-layer MLP trained on modular add…