Researchers have developed FineGen, a novel framework that uses vision-language models (VLMs) and a multi-agent system to automatically construct image-text datasets. This system employs a collaborative pipeline for generating, verifying, and correcting data, specifically focusing on creating hard negative samples that are semantically relevant but visually contradictory. The framework has been used to create FineGen-100K, a dataset with over 147,000 hard negatives, which significantly improved accuracy on downstream tasks by 14.4% when used for fine-tuning. AI
IMPACT Enhances fine-grained perception capabilities by providing specialized datasets for training vision-language models.
RANK_REASON The cluster contains a research paper detailing a new framework and dataset for image-text construction. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →