English(EN) Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Search-E1 方法通过自我进化简化了代理训练

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-21 14:00

研究人员推出了一种新颖的搜索增强推理代理的自我进化方法 Search-E1，该方法绕过了复杂的外部监督。该方法结合了 vanilla GRPO 和离线自蒸馏 (OFSD)，使代理能够独立改进。使用 Qwen2.5-3B 模型，该方法在七个 QA 基准测试中取得了 $0.440$ 的平均 EM 分数，优于现有的开源基线。 AI

影响简化了搜索增强推理代理的训练，可能使其更易于访问和更高效。

排序理由该集群包含一篇详细介绍 AI 代理训练新方法的论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Xuxin Zhang, Huangyu Dai, Lingtao Mao · 2026-05-22 04:00

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

arXiv:2605.22511v1 Announce Type: cross Abstract: Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of recent work pushes its performance further by adding elaborate machinery on top of this standar…
arXiv cs.AI TIER_1 English(EN) · Lingtao Mao · 2026-05-21 14:00

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Post-training has become the dominant recipe for turning a language model into a competent search-augmented reasoning agent. A line of recent work pushes its performance further by adding elaborate machinery on top of this standard pipeline. These augmentations import external su…

报道来源 [2]

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

相关实体

相关话题