Researchers have developed ViM-Q, a novel algorithm-hardware co-design specifically for accelerating Vision Mamba (ViM) model inference on FPGAs. This approach tackles challenges in quantizing dynamic activation outliers and adapting SSM computation for FPGA architectures. ViM-Q integrates a custom 4-bit weight quantization with a hardware accelerator featuring a linear engine and a pipelined SSM engine, enabling runtime configuration for diverse ViM models. Tests on an AMD ZCU102 FPGA demonstrated significant speedup and energy efficiency gains compared to a GPU baseline for low-batch inference. AI
影响 Enables efficient deployment of Vision Mamba models on resource-constrained edge devices.
排序理由 Academic paper detailing a new algorithm-hardware co-design for model inference. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →