PulseAugur
EN
LIVE 17:40:29

New AI Models Tackle Object Counting Challenges with Efficiency and Robustness

Researchers have introduced MambaCount, a new framework for text-guided open-vocabulary object counting that utilizes a Spatial Sparse State Space Duality (S^4D) block to overcome the limitations of Transformers in handling dense scenes and large scale variations. MambaCount addresses issues with causal modeling in Mamba and high entropy in spatial token responses, achieving state-of-the-art performance on the FSC-147 dataset with linear complexity. Concurrently, RT-Counter offers a real-time solution for this task, balancing accuracy and efficiency through a Visual Prototype Textualization module and Weaving Transformer layers, achieving competitive results while being significantly faster and more parameter-efficient. Additionally, a new benchmark, Robust-TOOC, has been proposed to evaluate object counting under adverse conditions, alongside Dual-TTT, a test-time training framework designed to improve robustness without altering existing architectures. AI

IMPACT These advancements in object counting could improve AI's ability to understand and interact with complex visual scenes, impacting applications in robotics, autonomous driving, and image analysis.

RANK_REASON Multiple research papers introducing new models and benchmarks in the field of computer vision.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 6 sources. How we write summaries →

COVERAGE [6]

  1. arXiv cs.CL TIER_1 English(EN) · Hao-Yuan Ma, Li Zhang, Minjie Qiang, Jie Gao ·

    MambaCount: Efficient Text-guided Open-vocabulary Object Counting with Spatial Sparse State Space Duality Block

    arXiv:2606.17650v1 Announce Type: cross Abstract: Text-guided Open-vocabulary Object Counting (TOOC) aims to estimate the number of objects described by text prompts, which is particularly challenging in dense scenes with large scale variations. Existing TOOC approaches predomina…

  2. arXiv cs.CL TIER_1 English(EN) · Jie Gao ·

    MambaCount: Efficient Text-guided Open-vocabulary Object Counting with Spatial Sparse State Space Duality Block

    Text-guided Open-vocabulary Object Counting (TOOC) aims to estimate the number of objects described by text prompts, which is particularly challenging in dense scenes with large scale variations. Existing TOOC approaches predominantly rely on Transformers, whose quadratic complex…

  3. arXiv cs.CV TIER_1 English(EN) · Hao-Yuan Ma, Li Zhang, Zhiwei Zhu, Jie Gao ·

    RT-Counter: Real-Time Text-Guided Open-Vocabulary Object Counting

    arXiv:2606.17561v1 Announce Type: new Abstract: Text-guided open-vocabulary object counting (TOOC) aims to count objects belonging to the categories specified by natural language descriptions. Although vision-language pre-trained models have been successful applied to TOOC tasks,…

  4. arXiv cs.CV TIER_1 English(EN) · Hao-Yuan Ma, Yuda Zou, Li Zhang, Yongchao Xu ·

    Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

    arXiv:2606.17601v1 Announce Type: new Abstract: Text-guided Open-vocabulary Object Counting (TOOC) enables counting arbitrary object categories specified by text prompts, offering substantially greater flexibility than conventional closed-set counting. However, existing TOOC meth…

  5. arXiv cs.CV TIER_1 English(EN) · Yongchao Xu ·

    Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

    Text-guided Open-vocabulary Object Counting (TOOC) enables counting arbitrary object categories specified by text prompts, offering substantially greater flexibility than conventional closed-set counting. However, existing TOOC methods are developed and evaluated primarily on ide…

  6. arXiv cs.CV TIER_1 English(EN) · Jie Gao ·

    RT-Counter: Real-Time Text-Guided Open-Vocabulary Object Counting

    Text-guided open-vocabulary object counting (TOOC) aims to count objects belonging to the categories specified by natural language descriptions. Although vision-language pre-trained models have been successful applied to TOOC tasks, they still struggle with fine-grained spatial u…