Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
Researchers have introduced DUPL, a novel policy learning approach designed to enhance multimodal reasoning in large language models. This method specifically addresses the challenge of distinguishing between uncertainty arising from complex reasoning and ambiguity in visual perception. By quantifying and utilizing both perceptual and output uncertainties, DUPL guides policy updates to focus learning on areas with high ambiguity, thereby improving targeted exploration. The approach has demonstrated significant accuracy gains on various multimodal reasoning benchmarks, outperforming existing methods and showing broad applicability across different algorithms and architectures. AI
IMPACT Enhances multimodal reasoning capabilities in LLMs by better handling perceptual ambiguity.