PulseAugur
EN
LIVE 08:35:31

New GeoMin method boosts data efficiency in semi-supervised RLVR

Researchers have introduced GeoMin, a novel method designed to improve the data efficiency of semi-supervised reinforcement learning with verifiable rewards (RLVR). This approach models global feature distributions from labeled data to identify discrepancies between correct and incorrect model outputs. By establishing a reliable prior for self-reward signals, GeoMin aims to better utilize unlabeled data, outperforming existing baselines and even fully supervised models with significantly fewer annotations. AI

IMPACT Enhances LLM reasoning capabilities by improving data efficiency in training, potentially reducing annotation costs.

RANK_REASON The cluster contains a research paper detailing a new method for improving LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Kai Tang, Zhengqing Zang, Bowen Song, Weiqiang Wang, Gang Chen ·

    GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

    arXiv:2606.04516v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from sev…