Researchers have identified a generalized 'Latent Evaluator' sub-graph within large language models like Gemma-3, Qwen2.5, and Llama-3 that is responsible for making judgments. This sub-graph is located in the mid-to-late multi-layer perceptrons and can be causally investigated using Position-aware Edge Attribution Patching (PEAP). The study found that while this core judging mechanism is shared across different tasks and formats, the output formatting relies on fragile, format-specific terminal branches, leading to inconsistencies when the output format changes. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Reveals internal mechanisms of LLM judgment, potentially improving benchmark reliability and understanding model behavior.
RANK_REASON The cluster contains an academic paper detailing a new mechanistic understanding of LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]