Researchers have identified a generalized 'Latent Evaluator' sub-graph within large language models like Gemma-3, Qwen2.5, and Llama-3 that is responsible for making judgments. This sub-graph is located in the mid-to-late multi-layer perceptrons and can be causally investigated using Position-aware Edge Attribution Patching (PEAP). The study found that while this core judging mechanism is shared across different tasks and formats, the output formatting relies on fragile, format-specific terminal branches, leading to inconsistencies when the output format changes. AI
影响 Reveals internal mechanisms of LLM judgment, potentially improving benchmark reliability and understanding model behavior.
排序理由 The cluster contains an academic paper detailing a new mechanistic understanding of LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →