PulseAugur
实时 11:11:13
English(EN) Hide to Guide: Learning via Semantic Masking

新的SMEPO技术通过掩盖专家痕迹来提高AI推理能力

研究人员开发了一种名为语义掩码专家策略优化(SMEPO)的新技术,以改进语言模型的强化学习。SMEPO通过语义掩码化专家痕迹中的关键信息,解决了模型仅仅复制专家痕迹而非真正进行推理的问题。这迫使模型在遵循专家整体问题解决结构的同时,重建缺失的元素。SMEPO在数学和编码等多个领域都显示出准确性的提高和训练时间的显著缩短。 AI

影响 该方法可能导致更高效的复杂推理AI模型训练,降低计算成本并提高性能。

排序理由 该集群包含一篇详细介绍改进AI模型训练新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Qi Liu, Mingdi Sun, Yongyi He, Zhi Zheng, Tong Xu, Yi Zheng, Zhefeng Wang, Enhong Chen ·

    Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

    arXiv:2605.29303v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) followed by reinforcement learning (RL) has become a standard post-training paradigm for large language models. This paradigm provides a cold-start for RL exploration, avoiding the inefficiency of pure R…

  2. arXiv cs.AI TIER_1 English(EN) · Gokul Srinivasagan, Kai Hartung, Munir Georges ·

    Entropy-aware Masking for Masked Language Modeling

    arXiv:2605.28526v1 Announce Type: new Abstract: Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding co…

  3. arXiv cs.AI TIER_1 English(EN) · Munir Georges ·

    Entropy-aware Masking for Masked Language Modeling

    Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding context. This process enables the model to capture…

  4. arXiv cs.AI TIER_1 English(EN) · Ruitao Liu, Qinghao Hu, Alex Hu, Yecheng Wu, Shang Yang, Luke J. Huang, Zhuoyang Zhang, Han Cai, Song Han ·

    Hide to Guide: Learning via Semantic Masking

    arXiv:2605.25198v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a powerful paradigm for improving language models on reasoning-intensive tasks, but its effectiveness is often limited by exploration. For example, models often fail…