A Redditor has successfully run a 1-trillion-parameter LLM, specifically Kimi K2.5, on a custom-built system using 768GB of second-hand Intel Optane Persistent Memory modules as RAM. This setup, featuring a single RTX 3060 GPU and a Xeon CPU, achieved approximately 4 tokens per second. The user leveraged Optane's lower latency compared to SSDs and its affordability on the used market, highlighting a potential niche for such memory solutions in LLM inference, especially as Intel has discontinued its Optane product line. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Demonstrates that large-scale LLMs can be run on consumer-grade hardware with creative memory solutions, potentially lowering the barrier to entry for local inference.
RANK_REASON User-driven hardware configuration demonstrating a novel application of existing technology for LLM inference. [lever_c_demoted from research: ic=1 ai=0.7]