Researchers have introduced GeoMin, a novel method designed to improve the data efficiency of semi-supervised reinforcement learning with verifiable rewards (RLVR). This approach models global feature distributions from labeled data to identify discrepancies between correct and incorrect model outputs. By establishing a reliable prior for self-reward signals, GeoMin aims to better utilize unlabeled data, outperforming existing baselines and even fully supervised models with significantly fewer annotations. AI
IMPACT Enhances LLM reasoning capabilities by improving data efficiency in training, potentially reducing annotation costs.
RANK_REASON The cluster contains a research paper detailing a new method for improving LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →