Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Researchers have developed a new parameter-efficient fine-tuning technique for multimodal large language models called ART (Art-based Reinforcement Training). Unlike existing methods that modify computational graphs, ART optimizes only the raw visual input of a frozen model. This approach allows for fine-tuning on pre-compiled high-throughput engines and can stylize the optimized visual input as computational artworks. ART has demonstrated competitive accuracy with LoRA on mathematics and structured-tool-use benchmarks, confirming its effectiveness across various Qwen model sizes. AI

IMPACT Enables more efficient fine-tuning of multimodal models, potentially accelerating development and deployment.

Qwen
LoRA
Large Language Models
vLLM
ART
Soft Prompting
multimodal LLMs