PulseAugur
LIVE 12:51:16
tool · [1 source] ·

Enthusiast uses 768GB Optane RAM to run 1T-parameter LLM locally

A Redditor has successfully run a 1-trillion-parameter LLM, specifically Kimi K2.5, on a custom-built system using 768GB of second-hand Intel Optane Persistent Memory modules as RAM. This setup, featuring a single RTX 3060 GPU and a Xeon CPU, achieved approximately 4 tokens per second. The user leveraged Optane's lower latency compared to SSDs and its affordability on the used market, highlighting a potential niche for such memory solutions in LLM inference, especially as Intel has discontinued its Optane product line. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Demonstrates that large-scale LLMs can be run on consumer-grade hardware with creative memory solutions, potentially lowering the barrier to entry for local inference.

RANK_REASON User-driven hardware configuration demonstrating a novel application of existing technology for LLM inference. [lever_c_demoted from research: ic=1 ai=0.7]

Read on Tom's Hardware →

Enthusiast uses 768GB Optane RAM to run 1T-parameter LLM locally

COVERAGE [1]

  1. Tom's Hardware TIER_1 · Mark Tyson ·

    768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second

    A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-trillion parameter LLM.