BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model
Researchers have developed BEiTScore, a novel evaluation metric for image captioning that addresses the limitations of existing methods. This new metric utilizes an efficient cross-encoder model, initialized from a visual question-answering checkpoint, to provide a more sensitive and computationally feasible assessment. BEiTScore is trained on a diverse dataset, including adversarial augmentations, and demonstrates state-of-the-art performance on a new benchmark designed for detailed captioning evaluation. AI
IMPACT Introduces a more efficient and sensitive method for evaluating image captioning models, potentially improving model development and quality assessment.