English(EN) Rigel: Reverse-Engineering the Metal 4.1 Tensor Compute Path on the Apple M4 Max GPU

Apple M4 Max GPU 的张量计算路径被模拟，而非加速

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 04:00

研究人员逆向工程了 Apple M4 Max GPU 上的 Metal 4.1 张量计算路径，发现 fp8 matmul2d 操作是模拟的，而非硬件加速。这意味着该操作在 GPU 的着色器核心上运行，至少以 fp32 精度累积，并且不使用专用的矩阵数据路径或 Apple Neural Engine。这些发现详细记录在一篇题为“Rigel”的论文中，通过实证表征和微基准测试实现，并开发了一个融合内核，其性能比分解路径高出 12.9%。 AI

影响揭示了 Apple 硬件上关键张量运算的模拟，影响了对 AI 模型性能的预期。

排序理由详细介绍硬件行为实证表征的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Ramchand Kumaresan · 2026-06-12 04:00

Rigel：逆向工程Apple M4 Max GPU上的Metal 4.1张量计算路径

arXiv:2606.12765v1 Announce Type: new Abstract: Apple's Metal 4.1 exposes a tensor compute path: the Metal Performance Primitives (MPP) matmul2d operation over cooperative_tensor fragments, whose interface is documented but whose hardware behavior is deliberately hidden. The spec…

报道来源 [1]

Rigel：逆向工程Apple M4 Max GPU上的Metal 4.1张量计算路径

相关实体

相关话题