PulseAugur
实时 13:31:22

LLM judge circuits revealed in Gemma, Qwen, Llama models

Researchers have identified a generalized 'Latent Evaluator' sub-graph within large language models like Gemma-3, Qwen2.5, and Llama-3 that is responsible for making judgments. This sub-graph is located in the mid-to-late multi-layer perceptrons and can be causally investigated using Position-aware Edge Attribution Patching (PEAP). The study found that while this core judging mechanism is shared across different tasks and formats, the output formatting relies on fragile, format-specific terminal branches, leading to inconsistencies when the output format changes. AI

影响 Reveals internal mechanisms of LLM judgment, potentially improving benchmark reliability and understanding model behavior.

排序理由 The cluster contains an academic paper detailing a new mechanistic understanding of LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

LLM judge circuits revealed in Gemma, Qwen, Llama models

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Simon Ostermann ·

    Judge Circuits

    LLM-as-a-judge has become the dominant paradigm for grading model outputs at scale, yet the same model assigns systematically different scores when its output format changes (e.g., a 1-5 rating vs. a True/False label). Existing diagnoses of these format-induced inconsistencies st…