Perceptual Flow Network and VGR enhance visual reasoning in LLMs

By PulseAugur Editorial · [3 sources] · 2026-05-04 04:00

Researchers have developed a Perceptual Flow Network (PFlowNet) to improve visual reasoning in Large-Vision Language Models (LVLMs). PFlowNet decouples perception from reasoning and uses variational reinforcement learning to guide perceptual behaviors, aiming to reduce language bias and hallucination. This approach has achieved state-of-the-art results on benchmarks like V* Bench and MME-RealWorld-lite. Another related model, VGR, enhances multimodal reasoning by grounding language deduction in detected image regions, showing significant improvements on benchmarks like ChartQA while using fewer image tokens. AI

IMPACT Introduces novel architectures for multimodal reasoning, potentially improving accuracy and reducing hallucinations in vision-language models.

RANK_REASON The cluster contains two arXiv papers detailing new models and methods for visual reasoning.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

arXiv cs.CV TIER_1 English(EN) · Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu · 2026-05-05 04:00

Perceptual Flow Network for Visually Grounded Reasoning

arXiv:2605.02730v1 Announce Type: new Abstract: Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods i…
arXiv cs.CV TIER_1 English(EN) · Yue Lu · 2026-05-04 15:31

Perceptual Flow Network for Visually Grounded Reasoning

Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as…
arXiv cs.CV TIER_1 English(EN) · Jiacong Wang, Zijian Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao · 2026-05-04 04:00

VGR: Visual Grounded Reasoning

arXiv:2506.11991v3 Announce Type: replace Abstract: In the field of multimodal chain-of-thought (CoT) reasoning, existing approaches predominantly rely on reasoning on pure language space, which inherently suffers from language bias and is largely confined to math or science doma…

COVERAGE [3]

Perceptual Flow Network for Visually Grounded Reasoning

Perceptual Flow Network for Visually Grounded Reasoning

VGR: Visual Grounded Reasoning

RELATED ENTITIES

RELATED TOPICS