Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment

Researchers have introduced CORA, a new method to address the thinking-answer inconsistency in multimodal large vision-language models (LVLMs). This inconsistency, where the reasoning process does not align semantically with the final answer, persists even during training and inference. CORA utilizes a consistency reward model and Hybrid Reward Advantage Splitting to improve task performance and ensure more faithful reasoning traces. AI

IMPACT Addresses a key challenge in multimodal AI by improving the faithfulness of reasoning processes, potentially leading to more reliable AI outputs.

large-language models
Group Relative Policy Optimization
Cora
Large Vision Language Models
Reinforcement Learning with Verifiable Rewards
Consistency-Oriented Reasoning Alignment
Hybrid Reward Advantage Splitting