Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
Researchers have developed two new methods to improve the efficiency of visual geometry transformers. One approach, "Good Token Hunting," uses a two-stage framework to reduce computational costs by selecting essential tokens, achieving over 85% acceleration for scenes with 500 images. The other method, "GeoWeaver," focuses on grounding visual tokens with geometric evidence before scene reasoning, enhancing spatial reasoning capabilities by adaptively allocating geometric abstractions to individual tokens. AI
IMPACT These methods offer significant speed-ups and improved reasoning for visual geometry transformers, potentially accelerating 3D reconstruction and spatial understanding tasks.