AMD Strix Halo NPUs Now Usable for LLM Inference with Lemonade Software

By PulseAugur Editorial · [1 sources] · 2026-06-24 15:16

A new software development, Lemonade, has been released that enables the use of the Neural Processing Unit (NPU) on AMD Strix Halo devices for running large language models. This allows for hybrid models that leverage both the NPU for rapid prompt processing and the integrated GPU for parallel execution, significantly improving performance. The development is a major step forward for users who purchased these devices a year ago, enabling them to utilize the full hardware capabilities for LLM inference. AI

IMPACT Enables faster LLM inference on AMD Strix Halo devices by utilizing NPUs for prompt processing.

RANK_REASON A new software tool enables previously underutilized hardware for LLM inference.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AMD Strix Halo NPUs Now Usable for LLM Inference with Lemonade Software

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/CSEliot · 2026-06-24 15:16

Big News for AMD / Strix Halo+ Owners

<div class="md">Admittedly this is news for me, but I'm hoping it could be of some use to others here as well! So, THE NPU IS USABLE!! I've owned an AMD Ryzen 395 Max AI+ (or whatever the naming is lol) for about a year now and have relied solely o…

COVERAGE [1]

Big News for AMD / Strix Halo+ Owners

RELATED ENTITIES

RELATED TOPICS