LLM judge circuits revealed in Gemma, Qwen, Llama models

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-15 14:57

Researchers have identified a generalized 'Latent Evaluator' sub-graph within large language models like Gemma-3, Qwen2.5, and Llama-3 that is responsible for making judgments. This sub-graph is located in the mid-to-late multi-layer perceptrons and can be causally investigated using Position-aware Edge Attribution Patching (PEAP). The study found that while this core judging mechanism is shared across different tasks and formats, the output formatting relies on fragile, format-specific terminal branches, leading to inconsistencies when the output format changes. AI

影响 Reveals internal mechanisms of LLM judgment, potentially improving benchmark reliability and understanding model behavior.

排序理由 The cluster contains an academic paper detailing a new mechanistic understanding of LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Simon Ostermann · 2026-05-15 14:57

Judge Circuits

LLM-as-a-judge has become the dominant paradigm for grading model outputs at scale, yet the same model assigns systematically different scores when its output format changes (e.g., a 1-5 rating vs. a True/False label). Existing diagnoses of these format-induced inconsistencies st…

报道来源 [1]

Judge Circuits

相关实体

相关话题