PulseAugur
实时 06:56:42
English(EN) Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Search-E1 方法通过自我进化简化了代理训练

研究人员推出了一种新颖的搜索增强推理代理的自我进化方法 Search-E1,该方法绕过了复杂的外部监督。该方法结合了 vanilla GRPO 和离线自蒸馏 (OFSD),使代理能够独立改进。使用 Qwen2.5-3B 模型,该方法在七个 QA 基准测试中取得了 $0.440$ 的平均 EM 分数,优于现有的开源基线。 AI

影响 简化了搜索增强推理代理的训练,可能使其更易于访问和更高效。

排序理由 该集群包含一篇详细介绍 AI 代理训练新方法的论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Xuxin Zhang, Huangyu Dai, Lingtao Mao ·

    Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

    arXiv:2605.22511v1 Announce Type: cross Abstract: Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of recent work pushes its performance further by adding elaborate machinery on top of this standar…

  2. arXiv cs.AI TIER_1 English(EN) · Lingtao Mao ·

    Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

    Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of recent work pushes its performance further by adding elaborate machinery on top of this standard pipeline. These augmentations import external su…