(TL) Count Anything at Any Granularity

新模型HieraCount通过多粒度方法改进物体计数

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 17:32

研究人员提出了一种用于开放世界物体计数的新框架，解决了当前视觉语言模型在根据用户意图准确识别和计数物体方面的脆弱性。他们建议将计数重新定义为一个多粒度问题，其中视觉示例和详细的文本提示（包括负面提示）都指定了目标外观和语义粒度。为了克服这种方法的局限性，他们开发了一个使用3D合成和VLM过滤的自动化管道，创建了用于计数任务的最大数据集KubriCount。他们的新模型HieraCount利用文本和视觉示例，显著提高了多粒度计数精度，并能泛化到现实场景。 AI

影响引入了一种更鲁棒的物体计数方法，可能改进依赖于视觉场景理解和量化的应用。

排序理由该集群包含一篇详细介绍新模型和数据集的物体计数研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 (TL) · Weidi Xie · 2026-05-11 17:32

Count Anything at Any Granularity

Open-world object counting remains brittle: despite rapid advances in vision-language models (VLMs), reliably counting the objects a user intends is far from solved. We argue that a central reason is that counting granularity is left implicit; users may refer to a specific identi…

报道来源 [1]

Count Anything at Any Granularity

相关实体

相关话题