English(EN) Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

Reddit上关于Qwen 3.5 Agent训练方法的讨论

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-23 05:53

一位Reddit用户正在寻求关于训练Qwen 3.5模型用于多工具Agent使用的建议。他们想知道是先进行监督微调（SFT）再进行强化学习（RL），还是直接采用纯RL方法。该用户还询问了用于工具使用Agent的有效奖励函数设计，以及处理并行工具执行的策略，特别是当一个工具的输出需要多次后续工具调用时。 AI

影响讨论了多工具Agent的训练方法，这对于开发专业AI应用的开发者来说是相关的。

排序理由用户生成关于特定模型训练方法的讨论。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/siri_1110 · 2026-06-23 05:53

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

<div class="md">To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: <ol> <li>SFT → RL or RL-only? - Is it still recommended to first do…

报道来源 [1]

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

相关实体

相关话题