PulseAugur
EN
LIVE 23:48:11

User seeks $150K local inference server advice

A user on Reddit is seeking advice on building a local inference server with a budget of $150,000. Their current production server uses four H100 GPUs, and they are looking for a comparable or better alternative, considering the H100s are nearing the end of their product cycle. The user is prioritizing cost-effectiveness for inference and needs the server to handle large models like 122b AWQ at a 256k context length with a TP of 2, in addition to a small embedding model. AI

RANK_REASON User-generated content on a forum asking for advice, not a news report.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Porespellar ·

    If you had $150K for building a production-class local inference server to serve 300 people, what would you buy?

    <!-- SC_OFF --><div class="md"><p>I know we usually focus on home lab stuff here for the most part, but I’m in a position where I’m trying to purchase a failover server for our production inference server for under $150K. Our main production server has 4 H100s, so I’m looking for…