A new study from Microsoft Research, Stanford, Berkeley, and CMU reveals that the listed per-token price of frontier reasoning models does not accurately reflect their actual running costs. In over 20% of comparisons, models with lower advertised prices were more expensive to use, with one instance showing a 28x higher cost. The primary driver of this discrepancy is the variable consumption of "thinking tokens," which constitute a significant portion of the total output cost and can fluctuate unpredictably even for the same query on the same model. AI
IMPACT Developers building on LLMs must account for variable operational costs, as sticker prices can be misleading and impact profit margins.
RANK_REASON The cluster reports on a new study comparing the listed vs. actual costs of running frontier reasoning models.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →