How much iron does an AI agent need? How we calculated resources for on-premise LLM and why calculators were 5 times wrong. Sergey Smirnov, AI Engineer and Founder, is speaking.
An AI engineer details the challenges of accurately calculating hardware requirements for on-premise LLM deployments. Initial estimates using a popular calculator for a GPT-OSS-120B model on two RTX Pro 6000 Blackwell GPUs predicted 5000 tokens/sec, but real-world performance was five times slower. The article explains how to properly assess LLM resource needs, especially with non-standard hardware, and describes a rigorous testing process to provide clients with reliable performance guarantees. AI
IMPACT Highlights the difficulty in accurately provisioning hardware for on-premise AI, potentially impacting enterprise adoption costs and timelines.