Bridging Modality Disconnect in Self-Reflection via Closed-Loop Visually Grounded Verification
Researchers have introduced MIRROR, a new framework designed to improve the reasoning capabilities of Vision-Language Models (VLMs). MIRROR addresses the issue of hallucinations and logic errors in VLMs by incorporating a closed-loop process that includes drafting, critiquing, and visually verifying answers based on specific image regions. To train this model, a new dataset called ReflectV was created, which provides multi-turn supervision with explicit reflection triggers and region-based verification actions. AI