eNTK eigenanalysis surfaces features in trained neural networks

By PulseAugur Editorial · [1 sources] · 2026-05-07 04:00

Researchers have demonstrated that analyzing the empirical Neural Tangent Kernel (eNTK) can reveal feature directions within trained neural networks. This method was tested on a 1-layer MLP and a 1-layer Transformer, showing that the top eigenspaces of the eNTK align with ground-truth or interpretable features. For a pretrained language model, Gemma-3-270M, eNTK eigendirections aligned with grammatical features better than PCA on model activations, suggesting eNTK eigenanalysis as a tool for mechanistic interpretability. AI

IMPACT Introduces a novel technique for understanding internal model representations, potentially aiding in interpretability research.

RANK_REASON Academic paper detailing a new method for analyzing neural network features. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Jennifer Lin · 2026-05-07 04:00

Feature Identification via the Empirical NTK

arXiv:2510.00468v4 Announce Type: replace Abstract: We provide evidence that eigenanalysis of the empirical neural tangent kernel (eNTK) can surface feature directions in trained neural networks. Across three increasingly realistic settings -- a 1-layer MLP trained on modular add…

COVERAGE [1]

Feature Identification via the Empirical NTK

RELATED ENTITIES

RELATED TOPICS