PulseAugur
实时 22:02:21
English(EN) HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference

新的HGQ-LUT和da4ml方法加速DNN训练和FPGA部署

研究人员开发了HGQ-LUT,一种用于训练基于查找表(LUT)的神经网络的新方法,该方法显著加快了训练过程,在现代GPU上速度提升超过100倍。该方法引入了专门的层和细粒度量化,以自动探索精度-资源权衡,无需手动调整。HGQ-LUT已集成到开源工具链中,能够为像CERN大型强子对撞机这样的应用实际部署这些高效的DNN。 AI

影响 加速FPGA上的DNN训练,为要求苛刻的应用实现更高效的实时推理。

排序理由 这是一篇详细介绍FPGA上DNN新训练方法的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新的HGQ-LUT和da4ml方法加速DNN训练和FPGA部署

报道来源 [3]

  1. arXiv cs.LG TIER_1 English(EN) · Chang Sun, Zhiqiang Que, Bakhtiar Zadeh, Qibin Liu, Kevin H. Alvarez, Wayne Luk, Maria Spiropulu ·

    HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference

    arXiv:2604.22293v1 Announce Type: cross Abstract: Lookup-table (LUT) based neural networks can deliver ultra-low latency and excellent hardware efficiency on FPGAs by mapping arithmetic operations directly onto the logic primitives. However, state-of-the-art LUT-aware training (L…

  2. arXiv cs.LG TIER_1 English(EN) · Chang Sun, Zhiqiang Que, Vladimir Loncar, Wayne Luk, Maria Spiropulu ·

    da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

    arXiv:2507.04535v2 Announce Type: replace-cross Abstract: Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment o…

  3. arXiv cs.LG TIER_1 English(EN) · Maria Spiropulu ·

    HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference

    Lookup-table (LUT) based neural networks can deliver ultra-low latency and excellent hardware efficiency on FPGAs by mapping arithmetic operations directly onto the logic primitives. However, state-of-the-art LUT-aware training (LAT) approaches remain difficult to use in practice…