PulseAugur
LIVE 13:09:29
research · [2 sources] ·
0
research

Small open-weight VLMs show higher sycophancy in image-text scoring, study finds

A new paper introduces the "SycoPhantasy" benchmark and the "Bluffing Coefficient" to quantify sycophancy and hallucination in small, open-weight vision-language models (VLMs). The research found a strong inverse correlation between model size and sycophancy, with smaller models being significantly more prone to assigning high scores without visual evidence. This has implications for using these models as automated evaluators, especially in tasks involving synthetic images. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Highlights potential unreliability in small VLMs used for automated evaluation, particularly with synthetic data.

RANK_REASON Academic paper introducing a new benchmark and metric for evaluating VLM behavior.

Read on arXiv cs.CV →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 · Arya Shah, Deepali Mishra, Chaklam Silpasuwanchai ·

    SycoPhantasy: Quantifying Sycophancy and Hallucination in Small Open Weight VLMs for Vision-Language Scoring of Fantasy Characters

    arXiv:2604.24346v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly deployed as evaluators in tasks requiring nuanced image understanding, yet their reliability in scoring alignment between images and text descriptions remains underexplored. We investig…

  2. arXiv cs.CV TIER_1 · Chaklam Silpasuwanchai ·

    SycoPhantasy: Quantifying Sycophancy and Hallucination in Small Open Weight VLMs for Vision-Language Scoring of Fantasy Characters

    Vision-language models (VLMs) are increasingly deployed as evaluators in tasks requiring nuanced image understanding, yet their reliability in scoring alignment between images and text descriptions remains underexplored. We investigate whether small, open-weight VLMs exhibit \emp…