PERL: Parameter Efficient Reasoning in CLIP Latent Space
Researchers have developed PERL, a novel framework for adapting vision-language models like CLIP to new tasks without significantly increasing parameter count. PERL employs iterative reasoning within the model's latent space, progressively refining representations through a compact reasoning module. This approach achieves a superior parameter-performance trade-off on numerous benchmarks, demonstrating strong accuracy with a minimal number of trainable parameters. AI
IMPACT Offers a more efficient method for adapting large vision-language models to new tasks, potentially reducing computational costs and improving performance on specialized applications.