A new benchmark, SciCode, has been developed to evaluate AI models on complex STEM reasoning tasks, building upon the existing HumanEval benchmark. This advanced evaluation aims to provide a more rigorous assessment of AI capabilities in scientific and mathematical domains. The development signifies a push towards more sophisticated AI testing beyond general coding proficiency. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON Development of a new benchmark for AI evaluation.