HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling
Researchers have developed HACK++, a novel framework designed to significantly reduce the memory and computational overhead of Visual Autoregressive (VAR) models. By analyzing attention heads and categorizing them into 'Contextual' and 'Structural' types, HACK++ implements a training-free compression method. This approach allows for adaptive budget allocation based on head function and reliance on historical scales, leading to substantial reductions in attention and cache budgets without compromising generation quality. AI
IMPACT Reduces memory and compute for visual autoregressive models, potentially enabling larger-scale deployments and faster inference.