English(EN) MiniMax's M3 runs on about a twentieth of the compute per token of its last model. Vendor figures: 9x faster prefill and 15x faster decode at a 1M-token context

MiniMax M3在100万token上下文下实现9倍预填充速度提升和15倍解码速度提升

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 02:04

MiniMax发布了其M3模型，据称与前代模型相比，每token的计算量显著降低。该公司声称，新模型在预填充阶段速度提升9倍，在解码阶段速度提升15倍，同时支持一百万token的上下文窗口。 AI

影响此次发布表明大型语言模型推理效率的显著提升，可能降低成本并支持具有更大上下文窗口的新应用。

排序理由 MiniMax M3模型发布，附带系统卡。[lever_c_demoted from frontier_release: ic=1 ai=1.0]

在 Mastodon — mastodon.social 阅读 →

MiniMax

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

Mastodon — mastodon.social TIER_1 English(EN) · datarazimedia · 2026-06-16 02:04

MiniMax的M3模型每个token的计算量约为其上一代的二十分之一。供应商数据显示：1M token上下文下，预填充速度快9倍，解码速度快15倍

MiniMax's M3 runs on about a twentieth of the compute per token of its last model. Vendor figures: 9x faster prefill and 15x faster decode at a 1M-token context, via a new sparse attention scheme that only bothers with the relevant bits of the prompt. Net effect: long-context AI …

链接 youtube.com/watch

报道来源 [1]

MiniMax的M3模型每个token的计算量约为其上一代的二十分之一。供应商数据显示：1M token上下文下，预填充速度快9倍，解码速度快15倍

相关实体

相关话题