The DeepSWE benchmark has seen the debut of two new code generation models: Claude Fable 5 and Kimi 2.7. These models are now available for evaluation on the benchmark, which focuses on assessing AI's capabilities in software engineering tasks. Their performance on DeepSWE will provide insights into their effectiveness in generating and understanding code. AI
IMPACT New models are being evaluated on a specific benchmark, providing insights into their code generation capabilities.
RANK_REASON New models are being evaluated on a specific benchmark. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →