Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1d · [6 sources]

RT-Counter: Real-Time Text-Guided Open-Vocabulary Object Counting

Researchers have introduced MambaCount, a new framework for text-guided open-vocabulary object counting that utilizes a Spatial Sparse State Space Duality (S^4D) block to overcome the limitations of Transformers in handling dense scenes and large scale variations. MambaCount addresses issues with causal modeling in Mamba and high entropy in spatial token responses, achieving state-of-the-art performance on the FSC-147 dataset with linear complexity. Concurrently, RT-Counter offers a real-time solution for this task, balancing accuracy and efficiency through a Visual Prototype Textualization module and Weaving Transformer layers, achieving competitive results while being significantly faster and more parameter-efficient. Additionally, a new benchmark, Robust-TOOC, has been proposed to evaluate object counting under adverse conditions, alongside Dual-TTT, a test-time training framework designed to improve robustness without altering existing architectures. AI

IMPACT These advancements in object counting could improve AI's ability to understand and interact with complex visual scenes, impacting applications in robotics, autonomous driving, and image analysis.