PulseAugur
实时 02:26:00
English(EN) 3/ Two pushes got them there.

Fireworks AI 详解用于持续模型更新的复杂 RL 基础设施

Fireworks AI 正在详细介绍训练大型语言模型所涉及的工程挑战和解决方案,特别关注来自人类反馈的强化学习 (RL)。他们强调,产品的实际使用是最有效的 RL 环境,并强调需要能够根据实时用户交互持续更新模型的基础设施。该公司还讨论了分布式 RL 的复杂性,包括数值稳定性和跨全球集群高效同步海量模型权重的问题。 AI

影响 Fireworks AI 的见解突显了先进模型训练(尤其是在 RL 领域)所需的重大工程投入,表明高效的基础设施是持续改进的关键。

排序理由 该集群由 Fireworks AI 的一系列 X 帖子组成,详细介绍了他们进行模型训练和 RL 的工程方法,而不是直接的产品或模型发布。

在 X — Fireworks (inference infra) 阅读 →

AI 生成摘要 · Google Gemini · 来自 10 个来源。 我们如何撰写摘要 →

报道来源 [10]

  1. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    10/ 更重要的一点:你的产品是你拥有的最好的 RL 环境。

    10/ The bigger point: your product is the best RL environment you'll ever have. Frontier labs ship models that are good at everything. The opportunity is a model that's great at your thing. Product, users, harness. That's the moat. Check out the ep: https://t.co/j085PLDElj

  2. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    9/ 实时强化学习的乐趣所在。

    9/ Real-time RL is where it gets fun. Catch live signals from real users on real generations. Update continuously. Ship a new version every few hours. Only works if the base model is already good enough that people want to use it. Real-time RL is the amplifier that runs on a

  3. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    8/ 模型作弊。RL奖励作弊。

    8/ Models cheat. RL rewards cheating. They figure out when they're in a sim versus production, and they learn tricks that score well in fake environments but fail for real. The RL environment has to look like production, or you're training a model that games the eval.

  4. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    分布式强化学习中存在一个不易察觉的数值问题,它将毁掉一次运行。

    7/ There's a quiet numerical problem buried in distributed RL that will wreck a run. Floating point addition isn't associative, so inference and training produce slightly different log probs for the same tokens. In an MoE model, a tiny difference can flip which expert activates,

  5. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    每5到10分钟将1TB的权重同步到四个全球集群本身就是一个工程难题。

    6/ Syncing 1TB of weights across four global clusters every 5 to 10 minutes is its own engineering problem. Trick is that RL only updates a subset of weights per step. A lossless delta compression scheme shrinks the transfer about 20x. Weights ship in under a minute. Inference

  6. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    5/ 使数学得以实现的因素是异步(流水线式)强化学习。

    5/ The thing that made the math work was async (pipelined) RL. Naive RL pauses training while rollouts run. Half the GPUs sit idle. Pipelined RL runs trainer and rollout workers at the same time. You eat a little staleness, but utilization goes way up. The bitter lesson wins

  7. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    4/ RL基础设施比预训练基础设施更难构建。难得多。

    4/ RL infrastructure is harder to build than pre-training infrastructure. A lot harder. Pre-training needs a big cluster. RL needs a big cluster, plus a whole inference fleet running rollouts that look like what users actually do. A rollout here is a full 50-turn Cursor agent

  8. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    3/ 两次推动,他们做到了。

    3/ Two pushes got them there. Mid-training on code at near pre-training scale to teach the model to write code. Large-scale RL on top to teach it to write correct code. Both required.

  9. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    2/ 费德里科开篇提出的心智模型将一切重塑。

    2/ The mental model Federico opens with is the one that reframes everything. A model is a storage drive. Finite bits. You decide what goes in. Cursor cares about software engineering inside Cursor. Spend every bit on that one job, and the model ends up running roughly 10x

  10. X — Fireworks (inference infra) TIER_1 English(EN) · FireworksAI_HQ ·

    Composer 2.5 正当红。值得一看该团队是如何走到这一步的。

    1/ Composer 2.5 is having a moment. Worth a look at how the team actually got here. @cursor_ai's Federico Cassano and @FireworksAI_HQ cofounder Dima Dzhulgakov discussed Training Data with @sonyatweetybird. The whole episode is worth your time, but we’ll break it down here.