Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 19h · [2 sources]

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

Researchers have identified a significant bias in multimodal large language models when they are used as judges. These models often prioritize plausible text narratives over perceptually correct visual information, a phenomenon termed Perceptual Judgment Bias. To combat this, a new dataset and training framework have been developed that use minimally edited counterfactual responses to isolate perceptual errors and train judges to be more grounded in visual perception. AI

IMPACT Addresses a key limitation in multimodal LLM evaluation, potentially improving their reliability for tasks requiring visual-textual alignment.

LLM-as-a-Judge
Multimodal Large Language Models
Perceptual Judgment Bias
GRPO
Perceptually Perturbed Judgment Dataset