LocalLLaMA users seek fast memory retriever for Hermes on NPUs

By PulseAugur Editorial · [1 sources] · 2026-05-26 20:10

A user on r/LocalLLaMA is seeking recommendations for a fast, local memory retriever to use with the Hermes model, specifically one that can run on an NPU. They are considering GPT OSS 20B but find it too slow for the required throughput of pulling memories. The user is also interested in optimizing agent subtasks with small models like Bonsai 1 bit or LFM and is looking for community input. AI

IMPACT Users are exploring ways to optimize local LLM performance for agent subtasks, indicating a trend towards more efficient on-device AI processing.

RANK_REASON User is asking for recommendations on a forum, not announcing a new product or research.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LocalLLaMA users seek fast memory retriever for Hermes on NPUs

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Miserable-Dare5090 · 2026-05-26 20:10

Fast little local memory retriever for Hermes

<div class="md"><p>As title says. Looking for suggestions of a good memory retriever (for use with hindsight/hermes) ideally that can run on a strix halo NPU. GPT OSS 20B would be good based on their outdated rankings but it’s slow on the NPU for this type of task …

COVERAGE [1]

Fast little local memory retriever for Hermes

RELATED ENTITIES

RELATED TOPICS