PulseAugur
EN
LIVE 05:04:16

User considers 4x Ascend GX10 GPUs for future open-source LLMs

A user on the r/LocalLLaMA subreddit is considering purchasing four Ascend GX10 GPUs to run future open-source large language models, such as a potential "fable 5" release. They cite performance benchmarks from others using similar hardware (4x DGX Sparks) with GLM5.2, noting speeds of 400-500 tokens/second for prompt processing and around 15 tokens/second for output at a 128k context window. While acknowledging this isn't blazing fast, the user finds it usable, especially with quantization, and wants to be prepared for upcoming models. AI

IMPACT Potential users are evaluating hardware configurations for running future open-source LLMs.

RANK_REASON User discussion about hardware for running LLMs.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

User considers 4x Ascend GX10 GPUs for future open-source LLMs

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/chikengunya ·

    Thinking about grabbing 4x Ascend GX10s

    <!-- SC_OFF --><div class="md"><p>Some in this sub have tested GLM5.2 on 4x DGX Sparks (or Ascend GX10) with 400-500 tok/s prompt processing and ~15 tok/s output at 128k context. Not blazing fast, but usable imo, especially with quantization.</p> <p>My thinking: If there's an ope…