LLMs show significant self-bias when grading each other, study finds

By PulseAugur Editorial · [1 sources] · 2026-06-28 00:10

A recent study involving 55 large language models revealed significant self-bias in their grading of other models. In an evaluation where models blindly graded each other, most model families showed a preference for their own siblings. Notably, Qwen models favored their own by approximately 0.9 points, while Mistral models exhibited the largest negative bias, penalizing their own by about 1.0 point. AI

IMPACT Reveals potential biases in LLM evaluations, suggesting that model performance metrics may be skewed by self-preference.

RANK_REASON The cluster describes findings from an independent evaluation of multiple LLMs, akin to academic research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLMs show significant self-bias when grading each other, study finds

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Silver_Raspberry_811 · 2026-06-28 00:10

I had 55 LLMs blind-grade each other (22k judgments, all open). Every model family with enough data is biased toward its own siblings. Qwen judges favor Qwen by ~0.9 points. Mistral penalizes its own by ~1.0.

<div class="md"><p>I have been running an open evaluation setup where N models answer the same prompt, then blind-grade each other in an N x N matrix with self-judgments excluded. No single privileged judge. So far: 286 evaluations, 198 hand-written questions, 22,2…

COVERAGE [1]

I had 55 LLMs blind-grade each other (22k judgments, all open). Every model family with enough data is biased toward its own siblings. Qwen judges favor Qwen by ~0.9 points. Mistral penalizes its own by ~1.0.

RELATED ENTITIES

RELATED TOPICS