A user on the r/LocalLLaMA subreddit is considering purchasing four Ascend GX10 GPUs to run future open-source large language models, such as a potential "fable 5" release. They cite performance benchmarks from others using similar hardware (4x DGX Sparks) with GLM5.2, noting speeds of 400-500 tokens/second for prompt processing and around 15 tokens/second for output at a 128k context window. While acknowledging this isn't blazing fast, the user finds it usable, especially with quantization, and wants to be prepared for upcoming models. AI
IMPACT Potential users are evaluating hardware configurations for running future open-source LLMs.
RANK_REASON User discussion about hardware for running LLMs.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →