Pulse

last 48h

[3/3] 97 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

MEME · r/LocalLLaMA English(EN) · 13h · REDDIT

Cheapest setup for >10 tok/sec for 120B dense LLM

A user on the r/LocalLLaMA subreddit is seeking the most cost-effective hardware configuration to run a 120 billion parameter dense Large Language Model (LLM) at a speed exceeding 10 tokens per second. The user requires this for generating rapid responses in role-playing game campaigns, ideally with a 64,000 token context window and quantized model precision (Q5 or Q6). They are exploring options for CPU-only, GPU-only, and mixed inference setups, noting the significant VRAM requirements for GPU-based solutions. AI
MEME · r/LocalLLaMA English(EN) · 14h · REDDIT

Does CPU matter for GPU inference?

A user on the r/LocalLLaMA subreddit is seeking advice on building a PC for large language model (LLM) inference. They want to prioritize GPU spending and minimize costs for other components. The core question is whether the CPU and RAM significantly impact inference performance when using powerful GPUs, specifically asking about potential penalties with older or lower-tier CPUs. AI
MEME · r/LocalLLaMA English(EN) · 1d · REDDIT

Windows keeps crashing on rtx 3090

A user on the r/LocalLLaMA subreddit is experiencing frequent Windows crashes when running AI models on their RTX 3090 graphics card. The crashes occur under heavy load, even when VRAM utilization is not a factor, and persist across fresh Windows installations and updated drivers. The user notes that a less powerful RTX 3060 did not exhibit this behavior, suggesting a potential issue with the RTX 3090 or its interaction with the system. AI

Pulse

Cheapest setup for >10 tok/sec for 120B dense LLM

Does CPU matter for GPU inference?

Windows keeps crashing on rtx 3090