LLM judge circuits revealed in Gemma, Qwen, Llama models

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have identified a generalized 'Latent Evaluator' sub-graph within large language models like Gemma-3, Qwen2.5, and Llama-3 that is responsible for making judgments. This sub-graph is located in the mid-to-late multi-layer perceptrons and can be causally investigated using Position-aware Edge Attribution Patching (PEAP). The study found that while this core judging mechanism is shared across different tasks and formats, the output formatting relies on fragile, format-specific terminal branches, leading to inconsistencies when the output format changes. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Reveals internal mechanisms of LLM judgment, potentially improving benchmark reliability and understanding model behavior.

RANK_REASON The cluster contains an academic paper detailing a new mechanistic understanding of LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 · Simon Ostermann · 2026-05-15 14:57

Judge Circuits

LLM-as-a-judge has become the dominant paradigm for grading model outputs at scale, yet the same model assigns systematically different scores when its output format changes (e.g., a 1-5 rating vs. a True/False label). Existing diagnoses of these format-induced inconsistencies st…

COVERAGE [1]

Judge Circuits

RELATED ENTITIES

RELATED TOPICS