FineGen framework creates hard negative image-text datasets

By PulseAugur Editorial · [1 sources] · 2026-06-09 04:00

Researchers have developed FineGen, a novel framework that uses vision-language models (VLMs) and a multi-agent system to automatically construct image-text datasets. This system employs a collaborative pipeline for generating, verifying, and correcting data, specifically focusing on creating hard negative samples that are semantically relevant but visually contradictory. The framework has been used to create FineGen-100K, a dataset with over 147,000 hard negatives, which significantly improved accuracy on downstream tasks by 14.4% when used for fine-tuning. AI

IMPACT Enhances fine-grained perception capabilities by providing specialized datasets for training vision-language models.

RANK_REASON The cluster contains a research paper detailing a new framework and dataset for image-text construction. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Chang Kong, Yuebing Li, Peng Mo, Haigang Zhang, Qiuming Luo · 2026-06-09 04:00

FineGen: A VLM-based Multi-Agent Framework for Fine-Grained Image-Text Dataset Construction

arXiv:2606.07645v1 Announce Type: cross Abstract: The scarcity of hard negative samples in current vision-language datasets significantly hinders fine-grained perception. To address this, we propose FineGen, a VLM-based Multi-Agent framework for automated dataset construction. By…

COVERAGE [1]

FineGen: A VLM-based Multi-Agent Framework for Fine-Grained Image-Text Dataset Construction

RELATED ENTITIES

RELATED TOPICS