PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
Researchers have developed PRISM, a novel training-free framework designed to efficiently select data for multimodal large language models (MLLMs). This method addresses the issue of redundancy in large datasets, which increases computational costs during visual instruction tuning. PRISM uniquely tackles the problem by modeling intrinsic visual semantics and re-centering implicit features, thereby mitigating global semantic drift caused by background elements. The framework significantly reduces the time for data selection and model tuning, achieving 30% of conventional pipelines, while also improving performance on various benchmarks. AI
IMPACT Reduces MLLM tuning costs and improves performance, potentially accelerating the development and deployment of multimodal AI applications.