English(EN) Ablation Study of Block Size, Weight Precision, and Scale Precision in NVFP4 Inference for Low-Power Edge-Efficient Neural Networks

新的NVLUT框架大幅降低了边缘AI推理的能耗

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-08 04:00

研究人员开发了一个名为NVLUT的新框架，用于在边缘设备上进行能效高的神经网络推理。该框架使用4位NVFP4激活和两级缩放方法，并用紧凑的查找表（LUT）访问取代了传统的乘法。研究发现，块大小为16在准确性和存储之间提供了良好的平衡，并且FP8和FP16权重仅比FP4权重带来微小的改进。与现有方法相比，NVLUT在能耗和硬件面积方面均有显著降低。 AI

影响使更强大的AI模型能够在低功耗边缘设备上运行，降低能耗和硬件成本。

排序理由详细介绍AI推理新技术的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Ovishake Sen, Venkata Nithin Kamineni, Daniel Lobo, Swarup Bhunia, Rickard Ewetz, Baibhab Chatterjee · 2026-06-08 04:00

低功耗边缘高效神经网络NVFP4推理中块大小、权重精度和尺度精度的消融研究

arXiv:2606.06527v1 Announce Type: cross Abstract: Energy-efficient edge inference requires reducing arithmetic cost, memory traffic, and hardware overhead. This paper presents an ablation-focused study of NVFP4 LUT-based inference for edge-efficient neural networks. The proposed …

报道来源 [1]

低功耗边缘高效神经网络NVFP4推理中块大小、权重精度和尺度精度的消融研究

相关实体

相关话题