Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1w · [4 sources]

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting

Researchers have developed new methods for adaptive image and video tokenization, allowing models to dynamically allocate computational resources based on visual complexity. AdaTok, a self-budgeting discrete 1D tokenizer, learns to adjust its token count per image, achieving competitive fidelity with significantly fewer tokens on average. Separately, a new framework for adaptive video tokenization uses temporal redundancy masking and latent inpainting to achieve efficient, content-driven token allocation, resulting in substantial inference-time speedups. AI

IMPACT These adaptive tokenization techniques could lead to more efficient AI models for image and video processing, reducing computational costs and increasing inference speeds.

ElasticTok-CV
InfoTok
Latent Inpainting Transformer (LIT)
Sai Aditya Patkuri
TokenBench
ImageNet-1K
Latent Inpainting Transformer
AdaTok