Image-text metrics fail semantic invariance tests, researchers find

By PulseAugur Editorial · [1 sources] · 2026-05-26 04:00

Researchers have identified significant semantic invariances in popular image-to-text evaluation metrics. These metrics, including CLIPScore and others, show sensitivity to benign spatial edits and phrasing changes, leading to score shifts and ranking flips. A study confirmed that human annotators found perturbed image-caption pairs equally correct, indicating the metrics' behavior, not semantic changes. The researchers propose an invariance-calibrated scoring method to mitigate these issues. AI

IMPACT Highlights flaws in current image-text evaluation, potentially leading to more robust and reliable AI model assessments.

RANK_REASON The cluster contains an academic paper detailing a new evaluation methodology for image-text metrics. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Image-text metrics fail semantic invariance tests, researchers find

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Amit Agarwal, Hitesh Laxmichand Patel, Meizhu Liu, Jyotika Singh, Karan Dua, Hansa Meghwani, Matthew Rowe, Michael Avendi, Yassi Abbasi, Tao Sheng, Sujith Ravi, Dan Roth · 2026-05-26 04:00

Do Image-Text Metrics Respect Semantic Invariances?

arXiv:2605.24702v1 Announce Type: new Abstract: Reference-free image-to-text evaluators are now standard for scoring image-caption alignment, yet it is unclear whether they respect semantic invariances. We present an invariance probe on five popular evaluators (CLIPScore, PAC-S, …

COVERAGE [1]

Do Image-Text Metrics Respect Semantic Invariances?

RELATED ENTITIES

RELATED TOPICS