English(EN) 10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

AI模型测试：Grok 4.3领先，免费版差异巨大

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-26 22:42

最近对十个AI模型进行的编码任务测试揭示了显著的性能差异，尤其是在免费版本中。Grok 4.3以81.6%的成功率成为表现最佳的模型，而Perceptron Mk1以极低的成本提供了近80%的卓越价值。在免费模型中，Owl Alpha以76.7%的得分脱颖而出，且没有出现硬性失败，尽管延迟是一个问题。其他模型如GPT Chat Latest和Mistral Medium 3.5则表现不一，前者价格最高，后者则出现超时。 AI

影响突显了AI模型之间显著的成本和性能差异，尤其是在免费版本中，这影响了开发者的选择和工具的选型。

排序理由文章展示了对多个AI模型在编码任务上的基准测试结果，比较了它们的性能和成本。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Vilius · 2026-05-26 22:42

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

By Vilius Vystartas | May 2026 I tested another 10 models across the same 10 agent coding tasks. Four of them were free-tier models — and the range was absurd: Owl Alpha scored 76.7% with zero hard fails, Laguna M.1 scored 10% and produced garbage on 9 out of 1…

报道来源 [1]

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

相关实体

相关话题