ABACUS model unifies image count understanding and generation

By PulseAugur Editorial · [1 sources] · 2026-06-24 04:00

Researchers have developed ABACUS, a unified vision-language model capable of both understanding and generating images related to object counts. This model, built upon a 3B-parameter foundation, incorporates density-aware adaptive zooming, a boundary-aware count policy, and a cycle-consistent GRPO strategy to improve accuracy and bridge the gap between understanding and generation. ABACUS has demonstrated state-of-the-art performance across seven benchmarks, surpassing both specialized models and larger generalist ones. AI

IMPACT This model advances the capabilities of vision-language models in count-related tasks, potentially improving applications in image analysis and generation.

RANK_REASON The cluster contains an academic paper detailing a new model and its performance on benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

ABACUS
GRPO

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ABACUS model unifies image count understanding and generation

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Anindya Mondal, Sauradip Nag, Anjan Dutta · 2026-06-24 04:00

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

arXiv:2606.23835v1 Announce Type: new Abstract: ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on exist…

COVERAGE [1]

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

RELATED ENTITIES

RELATED TOPICS