Brief · PulseAugur

TOOL · Mastodon — fosstodon.org Русский(RU) · 5h

How much iron does an AI agent need? How we calculated resources for on-premise LLM and why calculators were 5 times wrong. Sergey Smirnov, AI Engineer and Founder, is speaking.

An AI engineer details the challenges of accurately calculating hardware requirements for on-premise LLM deployments. Initial estimates using a popular calculator for a GPT-OSS-120B model on two RTX Pro 6000 Blackwell GPUs predicted 5000 tokens/sec, but real-world performance was five times slower. The article explains how to properly assess LLM resource needs, especially with non-standard hardware, and describes a rigorous testing process to provide clients with reliable performance guarantees. AI

IMPACT Highlights the difficulty in accurately provisioning hardware for on-premise AI, potentially impacting enterprise adoption costs and timelines.

GPT-OSS-120B
RTX Pro 6000 Blackwell
Sergey V Smirnov