Researchers have developed a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized neural networks (QNNs). This architecture addresses the limitations of existing hardware multipliers that cannot dynamically adjust precision for mixed-precision QNN models. Implemented and tested on an Ultra96 FPGA, the design demonstrates significant speedups ranging from 1.3185x to 3.5671x for mixed-precision model inference. It also features a reduced critical path delay, enabling higher clock frequencies of up to 250MHz. AI
IMPACT This architecture could enable more efficient and faster inference of complex AI models on edge devices with limited resources.
RANK_REASON The cluster contains an academic paper detailing a new hardware architecture for AI inference. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →