PulseAugur
EN
LIVE 12:27:47

New HCM-GRPO method boosts AI's physical reasoning in image screening

Researchers have developed a new method called HCM-GRPO to improve the physical plausibility reasoning capabilities of multimodal large language models (MLLMs). This approach includes a Hard Cases Mining strategy and a Dynamic Proportional Accuracy reward integrated into the Group Relative Policy Optimization framework. To support this, a dataset of over 128,000 samples, comprising around 640,000 images, was created to evaluate reasoning across appearance, shadow, layout, and extension rationality. Experiments showed that even advanced models like GPT5.2 and Gemini3-Pro struggle with this task, while the HCM-GRPO method achieved superior results with a smaller model. AI

IMPACT Enhances AI's ability to understand and generate physically plausible images, potentially improving image screening and content moderation.

RANK_REASON The cluster contains an academic paper detailing a new method and dataset for improving AI model capabilities. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Zhiyuan Hu, Zheng Sun, Yi Wei, Long Yu ·

    Physical Plausibility Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

    arXiv:2511.10055v2 Announce Type: replace Abstract: The performance of image generation has been significantly improved in recent years. However, the study of image screening is rare, and its performance with Multimodal Large Language Models (MLLMs) is unsatisfactory due to the l…