New methods enhance LLMs for fine-grained visual recognition tasks

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 04:00

Two new research papers propose novel methods for improving Fine-Grained Visual Recognition (FGVR) using Large Vision-Language Models (LVLMs). The first paper introduces SARE, a framework that adaptively applies reasoning based on recognition difficulty and reuses past failures to enhance accuracy and efficiency. The second paper, Fine-R1, utilizes Chain-of-Thought reasoning and policy optimization to make multi-modal LLMs excel in FGVR with minimal training data, outperforming existing models on both seen and unseen categories. AI

影响 Introduces advanced techniques for fine-grained visual recognition, potentially improving AI's ability to distinguish subtle visual differences in complex datasets.

排序理由 Two academic papers published on arXiv present new methodologies for fine-grained visual recognition using large vision-language models.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Jingxiao Yang, DaLin He, Miao Pan, Kaixiang Yao, Ge Su, Wenqi Zhang, Yifeng Hu, Tangwei Li, Yuke Li, Xuhong Zhang · 2026-04-29 04:00

SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

arXiv:2603.17729v3 Announce Type: replace Abstract: Recent advances in Large Vision-Language Models (LVLMs) have enabled training-free Fine-Grained Visual Recognition (FGVR). However, effectively exploiting LVLMs for FGVR remains challenging due to the inherent visual ambiguity o…
arXiv cs.CV TIER_1 English(EN) · Hulingxiao He, Zijun Geng, Yuxin Peng · 2026-04-28 04:00

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

arXiv:2602.07605v3 Announce Type: replace Abstract: Any entity in the visual world can be hierarchically grouped based on shared characteristics and mapped to fine-grained sub-categories. While Multi-modal Large Language Models (MLLMs) achieve strong performance on coarse-grained…

报道来源 [2]

SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

相关实体

相关话题