English(EN) LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

LoKA框架为大型推荐模型实现低精度FP8

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 17:32

研究人员开发了LoKA，一个旨在使低精度算术（特别是FP8）在大型推荐模型（LRMs）中实用的框架。与以往常常导致模型质量下降的尝试不同，LoKA采用了系统-模型协同设计方法。它通过统计分析来识别安全的FP8采用点，进行模型适配以提高稳定性和效率，以及一个根据精度要求选择最佳FP8内核的运行时来实现这一点。 AI

影响通过利用低精度硬件，实现大型推荐模型更高效的训练和推理。

排序理由该集群包含一篇学术论文，详细介绍了将低精度算术应用于推荐模型的新框架。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Chunqiang Tang · 2026-05-11 17:32

LoKA：大规模推荐模型的低精度核应用

Recent GPU generations deliver significantly higher FLOPs using lower-precision arithmetic, such as FP8. While successfully applied to large language models (LLMs), its adoption in large recommendation models (LRMs) has been limited. This is because LRMs are numerically sensitive…

报道来源 [1]

LoKA：大规模推荐模型的低精度核应用

相关实体

相关话题