PulseAugur
EN
LIVE 00:05:11

Redditor uses 768GB of used Optane RAM to run 1T-parameter LLM locally

A Redditor has successfully run a 1-trillion-parameter LLM, specifically Kimi K2.5, locally on a single GPU workstation by utilizing 768GB of second-hand Intel Optane Persistent Memory modules as RAM. This setup achieved approximately 4 tokens per second, a performance deemed impressive given the hardware's budget constraints. The use of discontinued Optane DIMMs highlights a potential market gap for affordable, high-capacity memory solutions for large language model inference, especially as DRAM prices fluctuate. AI

IMPACT Demonstrates a cost-effective method for running large LLMs locally, potentially influencing future hardware configurations for AI inference.

RANK_REASON User-driven application of existing hardware for a specific AI task.

Read on Tom's Hardware →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Redditor uses 768GB of used Optane RAM to run 1T-parameter LLM locally

COVERAGE [3]

  1. Tom's Hardware TIER_1 English(EN) · Mark Tyson ·

    768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second

    A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-trillion parameter LLM.

  2. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 t

    768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-t…

  3. r/singularity TIER_2 English(EN) · /u/Anen-o-me ·

    768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second

    <table> <tr><td> <a href="https://www.reddit.com/r/singularity/comments/1tm1u3l/768gb_of_cheap_intel_optane_dimm_memory_sticks/"> <img alt="768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install …