Researchers have identified two distinct populations within function-vector (FV) heads in large language models, challenging the assumption that these heads are a homogeneous group. By employing a sign-preserving criterion instead of magnitude-only ranking, they found that FV heads either push correct logits up (writers) or push them down (cancellers). This dual nature was observed across multiple model families and scales, and zero-ablating cancellers led to improved accuracy. AI
IMPACT Reveals a more nuanced understanding of how LLMs process information, potentially impacting future model interpretability and design.
RANK_REASON Academic paper detailing novel findings about LLM internal mechanisms. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →