A user on Reddit is seeking advice on building a local inference server with a budget of $150,000. Their current production server uses four H100 GPUs, and they are looking for a comparable or better alternative, considering the H100s are nearing the end of their product cycle. The user is prioritizing cost-effectiveness for inference and needs the server to handle large models like 122b AWQ at a 256k context length with a TP of 2, in addition to a small embedding model. AI
RANK_REASON User-generated content on a forum asking for advice, not a news report.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →