CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment
Researchers have introduced CORA, a new method to address the thinking-answer inconsistency in multimodal large vision-language models (LVLMs). This inconsistency, where the reasoning process does not align semantically with the final answer, persists even during training and inference. CORA utilizes a consistency reward model and Hybrid Reward Advantage Splitting to improve task performance and ensure more faithful reasoning traces. AI
IMPACT Addresses a key challenge in multimodal AI by improving the faithfulness of reasoning processes, potentially leading to more reliable AI outputs.