LLM Self-Preference Bias: How Anonymized Peer Review Fixes It
A developer discovered that their LLM evaluation panel exhibited a significant self-preference bias, where models favored their own generated outputs over others, regardless of quality. This bias, documented in a NeurIPS paper, means models score outputs that match their own writing style higher. The developer also identified verbosity and position biases, where longer or earlier answers were unfairly favored. Attempts to correct these biases through prompt engineering proved ineffective, as the models were unaware of their own preferences. AI
IMPACT Highlights a critical flaw in automated LLM evaluation, potentially skewing model development and deployment.