Unhinged results from UC Berkeley's new ALE benchmark of 55 different industries
A new benchmark from UC Berkeley, the ALE benchmark, has revealed significant cost and runtime disparities between various AI models across 55 industries. The benchmark highlights that custom harnesses can outperform commercial models like Codex, and that models like Anthropic's Claude Opus 4.8 are significantly slower and more expensive than previous versions for similar results. The findings suggest a highly variable and unoptimized AI market where direct benchmarking is crucial for users to determine the most cost-effective and efficient models for their specific workloads. AI
IMPACT Highlights extreme cost and runtime inefficiencies in current AI models, necessitating user-driven benchmarking for optimal workload performance.