Brief · PulseAugur

RESEARCH · arXiv cs.CL Italiano(IT) · 22h · [2 sources]

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

A new research paper explores "Evaluator Preference Collapse" (EPC) in AI agents, finding that multimodal settings significantly amplify this bias. When using GPT-4o to evaluate DeepSeek-chat, a single strategy dominated 48.4% of the weight, a 3.2x increase compared to text-only evaluations. The study also identified "cross-modal contagion," where preferences learned in one modality transfer to and negatively impact another. Self-evaluation proved nearly immune to contagion, while cross-model evaluation was identified as the primary risk factor. AI

IMPACT Highlights potential biases in AI systems, particularly when agents evaluate their own multimodal outputs, suggesting a need for careful design of evaluation frameworks.

GPT-4o
DeepSeek Chat
Evaluator Preference Collapse
cross-modal contagion
DashScope
MM-EPC