Anthropic's Claude 3.5 Sonnet achieves 92% on HumanEval benchmark

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Anthropic's Claude 3.5 Sonnet model has achieved a 92% pass rate on the HumanEval benchmark, a significant improvement over previous models. This performance was demonstrated through artifacts generated by Claude.ai, showcasing its advanced coding capabilities. The model's success suggests a leap forward in AI's ability to understand and generate complex code. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON The article reports on a specific benchmark result for a model, which falls under research.

Read on Smol AINews →

COVERAGE [1]

Smol AINews TIER_1 · 2024-06-21 07:27

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

**Claude 3.5 Sonnet**, released by **Anthropic**, is positioned as a Pareto improvement over Claude 3 Opus, operating at **twice the speed** and costing **one-fifth** as much. It achieves state-of-the-art results on benchmarks like **GPQA, MMLU, and HumanEval**, surpassing even *…

COVERAGE [1]

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

RELATED TOPICS