Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training
Researchers have developed a new parameter-efficient fine-tuning technique for multimodal large language models called ART (Art-based Reinforcement Training). Unlike existing methods that modify computational graphs, ART optimizes only the raw visual input of a frozen model. This approach allows for fine-tuning on pre-compiled high-throughput engines and can stylize the optimized visual input as computational artworks. ART has demonstrated competitive accuracy with LoRA on mathematics and structured-tool-use benchmarks, confirming its effectiveness across various Qwen model sizes. AI
IMPACT Enables more efficient fine-tuning of multimodal models, potentially accelerating development and deployment.