This guide details how to run Large Language Models (LLMs) on AMD NPUs using FastFlowLM on Fedora Linux. It outlines a four-layer setup requiring building XRT, the NPU plugin, and FastFlowLM from source, as pre-built packages are not available for Fedora. Key challenges include ensuring IOMMU is enabled and correctly symlinking XRT components. The guide provides step-by-step instructions for installing dependencies, building and installing XRT and the NPU plugin, and configuring memory lock limits, while emphasizing the critical need to avoid the `amd_iommu=off` kernel parameter. AI
IMPACT Enables running LLMs on AMD NPUs, potentially expanding hardware options for AI inference.
RANK_REASON Guide on setting up specific hardware and software for a particular task.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →