Researchers have introduced Multi-LCB, a new benchmark designed to evaluate large language models (LLMs) on their code-generation capabilities across twelve programming languages, extending beyond Python. This benchmark aims to address the Python-centric nature of the existing LiveCodeBench (LCB) by transforming LCB's Python tasks into equivalent problems in other languages while maintaining LCB's contamination controls. Initial evaluations using Multi-LCB on 24 LLMs revealed evidence of Python overfitting, language-specific contamination, and significant performance disparities in multilingual coding. AI
IMPACT This benchmark will help identify and address LLM limitations in multilingual code generation, pushing for more robust and versatile AI coding assistants.
RANK_REASON The cluster describes a new academic benchmark for evaluating LLMs on code generation, presented in a research paper. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →