A Redditor has successfully run a 1-trillion-parameter LLM, specifically Kimi K2.5, locally on a single GPU workstation by utilizing 768GB of second-hand Intel Optane Persistent Memory modules as RAM. This setup achieved approximately 4 tokens per second, a performance deemed impressive given the hardware's budget constraints. The use of discontinued Optane DIMMs highlights a potential market gap for affordable, high-capacity memory solutions for large language model inference, especially as DRAM prices fluctuate. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Demonstrates a cost-effective method for running large LLMs locally, potentially influencing future hardware configurations for AI inference.
RANK_REASON User-driven application of existing hardware for a specific AI task.