Researchers have developed a new method called Logit-Contribution Scoring (LOCOS) to identify non-literal retrieval heads in large language models. Unlike previous methods that focused on literal token matching, LOCOS analyzes the output-value circuit of attention heads to understand how they synthesize information from context. This approach has shown greater effectiveness in detecting heads responsible for non-literal retrieval across various model families, including Qwen3, Gemma-3, and OLMo-3.1, leading to significant performance drops in tasks requiring synthesis when these identified heads are ablated. AI
IMPACT Provides a more accurate method for interpreting how LLMs synthesize information, crucial for understanding and improving long-context capabilities.
RANK_REASON Academic paper introducing a new method for analyzing LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →